Computer Science > QUESTIONS & ANSWERS > CS 189 Introduction to Machine Learning Spring 2022 Homework 6 Q&A | University of California (All)

CS 189 Introduction to Machine Learning Spring 2022 Homework 6 Q&A | University of California

Document Content and Description Below

COMPSCI 61C Machine Structure - University of California - CS 189 Introduction to Machine Learning Spring 2022 Homework 6 Q&A This section will provide a background on neural networks that is designed ... to help you complete the assignment. There are no questions in this part. The questions for the homework begin in section 4. 2.1 Neural Networks Many of the most exciting recent breakthroughs in machine learning have come from “deep” (many-layered) neural networks, such as the deep reinforcement learning algorithm that learned to play Atari from pixels, or the GPT-2 model, which generates text that is nearly indistinguishable from human-generated text. Neural network libraries such as Tensorflow and PyTorch have made training complicated neural network architectures very easy. However, we want to emphasize that neural networks begin with fundamentally simple models that are just a few steps removed from basic logistic regression. In this assignment, you will build two fundamental types of neural network models, all in plain numpy: a feed-forward fully-connected network, and a convolutional neural network. We will start with the essential elements and then build up in complexity. A neural network model is defined by the following. • An architecture defining the flow of information between computational layers. This defines the composition of functions that the network performs from input to output. • A cost function (e.g. cross-entropy or mean squared error). • An optimization algorithm (e.g. stochastic gradient descent with backpropagation). • A set of hyperparameters. (Here we use this as a catch-all term that includes algorithm parameters that technically are not “hyperparameters” because they don’t help you change the bias-variance tradeoff, such as the learning rate and the mini-batch size for stochastic gradient descent with minibatches.) Each layer is defined by the following components. • A parameterized function that defines the layer’s map from input to output (e.g. f (x) = σ(W x + b)). • An activation function σ (e.g. ReLU, sigmoid, etc.). • A set of parameters (e.g. weights and bias terms). HW6, ©UCB CS 189, Spring 2022. All Rights Reserved. This may not be publicly shared without explicit permission. 2Neural networks are commonly used for supervised learning problems, where we have a set of inputs and a set of labels, and we want to learn the function that maps inputs to labels. To learn this function, we need to update the parameters of the network (i.e. the weights, including the bias terms). We do this using minibatch gradient descent. To compute the gradients for gradient descent, we use a dynamic programming algorithm called backpropagation. In the backpropagation algorithm, we first compute what is called a “forward pass” of the network. In the forward pass, we send a mini-batch of input data (e.g. 50 datapoints) through the network. The result is a set of output labels, which we use as input to our loss function. We then take the derivatives of the loss with respect to the parameters of each layer, starting with the output of the network and using the chain rule to propagate backwards through the layers. This is called the “backward pass.” During the backward pass we compute the derivatives of the loss function with respect to each of the model parameters, starting from the last layer and “propagating” the information from the loss function backwards through the network. This lets us calculate derivatives with respect to all the parameters of our network while letting us avoid computing the same derivatives multiple times. To summarize, training a neural network involves three steps. 1. Forward propagation of inputs. 2. Computing the cost. 3. Backpropagation and parameter updates. 2.2 Batching When building neural networks, we have to carefully consider the data. In Homework 4, you coded both batch gradient descent and stochastic gradient descent for logistic regression. For the stochastic version, where only a single data point was used, the form of derivatives used in gradient descent were different than those of batch gradient descent. Neural networks usually operate on mini-batches, or subsets of the data matrix. This is because iterating on all the data at once (batch gradient descent) is inefficient for large data sets, whereas iterating on just one training point at a time introduces excessive stochasticity (randomness) and makes poor use of your computer’s caches and potential for parallelism. Thus, every step of your neural network should be defined to operate on mini-batches of data. During a single operation of mini-batch gradient descent, you take a matrix of shape (B, d) where B is the mini-batch size and d is the number of features, and do a forward pass on B training points at once—ideally using vector operations to obtain some parallelism in your computations (as every training point is processed the same way). The input to a convolutional neural network for image recognition might be a four-dimensional array of shape (B, H, W, C) where B is the mini-batch size, H is the height of the image, W is the width of the image, and C is the number of channels in the image (3 for RGB—that is, red, green, and blue intensities). As you are writing the gradient descent algorithm to work on mini-batches, all of your derivations must work for mini-batches. For that reason, many of the derivations you do for this homework will differ from those you have seen in class. Thinking in terms of mini-batches often changes the shapes and operations you do. Your derivations must be for mini-batches and cannot use loops to iterate over individual data points. Be prepared to spend some time working out the tricky details of how to do this. 2.3 Feed-Forward, Fully-Connected Neural Networks A feed-forward, fully-connected neural network consists of layers of units alternating with layers of edges. Each layer of edges performs an affine transformation of an input, followed by a nonlinear activation funcHW6, ©UCB CS 189, Spring 2022. All Rights Reserved. This may not be publicly shared without explicit permission. 3Figure 1: A 3-layer fully-connected neural network. Figure 2: A single fully-connected neuron. tion. “Fully-connected” means that a layer of edges connects every unit in one layer of units to every unit in the next layer of units. We will use the following notation when defining fully-connected layers, with superscripts surrounded by brackets indexing layers (both layers of units and layers of edges) and subscripts indexing the vector/matrix elements. In this notation, we will use row vectors (not column vectors) to represent unit layers so that we can apply successive matrices (edge layers) to them from left to right. • x: A single data vector, of shape 1 × d, where d is the number of features. • y: A single label vector, of shape 1×k, where k is the number of output units/features. These could be regression values or they could symbolize classifications (and you can mix output units of both types). • n[l]: The number of units (neurons) in unit layer l. • W[l]: A matrix of weights connecting unit layer l−1 with unit layer l, of shape n[l−1] ×n[l]. This matrix represents the weights of the connections in edge layer l. At edge layer 1, the shape is d × n[1]. • b[l]: The bias vector for layer l, of shape 1 × n[l]. • h[l]: The output of edge layer l. This is a vector of shape 1 × n[l]. • σ[l](·): The nonlinear activation function applied at layer l. A fully-connected layer l is a function ϕ[l](h[l−1]) = σ[l](h[l−1]W[l] + b[l]) = h[l]. The output h[l] of edge layer l is computed then used as the input to edge layer l + 1. (At edge layer 1, h[0] is simply the input point x.) A neural network is thus a composition of functions. We want to find weights such that the network maps each input example x to its label y [Show More]

Last updated: 8 months ago

Preview 4 out of 16 pages

Buy Now

Instant download

We Accept: