Mathematics > EXAM > CS 361 Computational Probability - University of Illinois_ CS 361 Probability and Statistics for Com (All)

CS 361 Computational Probability - University of Illinois_ CS 361 Probability and Statistics for Computer Science. Final Project

Document Content and Description Below

No it does not make sense to use the secant method since it requires that the function can be evaluated precisely. Meaning that the function must always give the same output for the same input. Exe... rcise 3. The family of learning rates that I have come up with is the sequence: an = n1α where 1 2 < α ≤ 1. This set of possible sequences will diverge given the p-value test since the exponent is less than or equal to 1, but will diverge when each element is squared since the exponent will be greater than 1. Exercise 8. Yes, gradient descent is essentially a root finding algorithm. The procedure of gradient descent and root finding are identical, both procedures use the same input (vector) and outputs (vector) and require the same amount of information (gradient). Exercise 9. Yes, SGD is a stochastic approximation algorithm. Stochastic approximation algorithms iteratively find roots or optimize functions where the data is corrupted by noise, the goal is to recover properties of the function without needing to evaluate it directly. SGD meets all these properties. Exercise 10. We want to show that under the following definitions: 1• f(x) = Q(x) = k1 Pk i=1 Qi(x) • g(x) = Qi(x) and rg(x) = rQi(x) • z − Qi(x) − Q(x) and g(x) = f(x) + z these expressions hold true: • E[g(x)] = f(x) • E[rg(x)] = rf(x) • E[z] = 0 We will begin by proving the first expression: E[g(x)] = E[Qi(x)] = kX i =1 Qi(x)Pi(x) The function Pi(x) is the probability of picking the ith data point with input x. Since the sampling of data points is done uniformly meaning each data point has an equal probability of being selected, this means out of k data points each point has a probability of k1 of being picked. This our summation becomes: kX i =1 Qi(x) 1 k = 1 k kX i =1 Qi(x) = f(x) from the definitions listed above. The next expression can be proved in the following manner: E[rg(x)] = E[rQi(x)] = kX i =1 rQi(x)Pi(x) Just as in the last proof we showed that Pi(x) = k1 , we will use the same definition because it follows the same logic as above. Our summation becomes kX i =1 rQi(x) 1 k = 1 k kX i =1 rQi(x) = r( 1 k kX i =1 Qi(x)) = rQ(x) = rf(x) Lastly, to prove E[z] = 0 we can do the following: E[z] = E[Qi(x) − Q(x)] = E[Qi(x)] − E[Q(x)] We’ve shown above that E[Qi(x)] = k1 Pk i=1 Qi(x) so we can substitute this function as Qi(x) to get the following, and using the definition of Q(x) we can substitute that function in for E[Q(x)]: E[Qi(x)] = E[ 1 k kX i =1 Qi(x)] E[Q(x)] = E[ 1 k kX i =1 Qi(x)] Thus: E[z] = E[Qi(x)] − E[Q(x)] = E[ 1 k kX i =1 Qi(x)] − E[ 1 k kX i =1 Qi(x)] = 0 2Exercise 11. 1. d dx g(x) = (x + z) 2. Yes, f does have a unique root. We can prove as such by taking the second derivative of f(x). So, d2 dx2 (1 2x2) = 1 . Since the second derivative is always positive we know that the function is always concave up, or more importantly never switches concavity (never change direction after the global optimum). When we solve for the function we know that the root is 0, which is also the global minimum, and since the function is always concave up it is the only minimum. [Show More]

Last updated: 2 years ago

Preview 1 out of 11 pages

Buy Now

Instant download

We Accept: