Statistics > QUESTIONS & ANSWERS > TEST BANK FOR STAT C100. COMPLETE GUIDE FOR FINAL EXAM PREPARATION. GRADED A. (All)

TEST BANK FOR STAT C100. COMPLETE GUIDE FOR FINAL EXAM PREPARATION. GRADED A.

Document Content and Description Below

Principal Component Analysis In lecture we discussed how PCA can be used for dimensionality reduction. Specifically, given a high dimensional dataset, PCA allows us to: 1. Understand the rank of th... e data. If principal components capture almost all of the variance, then the data is effectively rank . 2. Create 2D scatterplots of the data. Such plots are a rank 2 representation of our data, and allow us to visually identify clusters of similar observations. 3. Create other low rank approximations of the data. Other than the 2D scatterplots mentioned above, this is something we won't really do in DS100, so we've left it as an optional exercise (question 4) at the end of this homework. A solid geometric understanding of PCA will help you understand why PCA is able to do these three things. In this homework, we'll build that geometric intuition, and will will also look at PCA on two datasets: One where PCA works poorly, and another where it works pretty well. Due Date This assignment is due Thursday, October 24th at 11:59pm PST. Collaboration Policy Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If you do discuss the assignments with others please include their names in the cell below. ? ? Collaborators: ...In [1]: import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import plotly.express as px #Note: If you're having problems with the 3d scatter plots, unco mment the two lines below, and you should see a version that # number that is at least 4.1.1. # import plotly # plotly.__version__ Question 1: PCA on 3D Data In question 1, our goal is to see visually how PCA is simply the process of rotating the coordinate axes of our data. The code below reads in a 3D dataset. We have named the variable surfboard because the data resembles a surfboard when plotted in 3D space. In [2]: surfboard = pd.read_csv("data3d.csv") surfboard.head(5)The cell below will allow you to view the data as a 3d scatterplot. Rotate the data around and zoom in and out using your trackpad or the controls at the top right of the figure. You should see that the data is an ellipsoid that looks roughly like a surfboard or a hashbrown patty (https://www.google.com/search? q=hashbrown+patty&source=lnms&tbm=isch). That is, it is pretty long in one direction, pretty wide in another direction, and relatively thin along its third dimension. We can think of these as the "length", "width", and "thickness" of the surfboard data. Observe that the surfboard is not aligned with the x/y/z axes. If you get an error that your browser does not support webgl, you may need to restart your kernel and/or browser. [Show More]

Last updated: 2 years ago

Preview 1 out of 30 pages

Buy Now

Instant download

We Accept: