Statistics > QUESTIONS & ANSWERS > TEST BANK FOR STAT C100. COMPLETE GUIDE FOR FINAL EXAM PREPARATION. GRADED A. (All)
Principal Component Analysis In lecture we discussed how PCA can be used for dimensionality reduction. Specifically, given a high dimensional dataset, PCA allows us to: 1. Understand the rank of th... e data. If principal components capture almost all of the variance, then the data is effectively rank . 2. Create 2D scatterplots of the data. Such plots are a rank 2 representation of our data, and allow us to visually identify clusters of similar observations. 3. Create other low rank approximations of the data. Other than the 2D scatterplots mentioned above, this is something we won't really do in DS100, so we've left it as an optional exercise (question 4) at the end of this homework. A solid geometric understanding of PCA will help you understand why PCA is able to do these three things. In this homework, we'll build that geometric intuition, and will will also look at PCA on two datasets: One where PCA works poorly, and another where it works pretty well. Due Date This assignment is due Thursday, October 24th at 11:59pm PST. Collaboration Policy Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If you do discuss the assignments with others please include their names in the cell below. ? ? Collaborators: ...In [1]: import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import plotly.express as px #Note: If you're having problems with the 3d scatter plots, unco mment the two lines below, and you should see a version that # number that is at least 4.1.1. # import plotly # plotly.__version__ Question 1: PCA on 3D Data In question 1, our goal is to see visually how PCA is simply the process of rotating the coordinate axes of our data. The code below reads in a 3D dataset. We have named the variable surfboard because the data resembles a surfboard when plotted in 3D space. In [2]: surfboard = pd.read_csv("data3d.csv") surfboard.head(5)The cell below will allow you to view the data as a 3d scatterplot. Rotate the data around and zoom in and out using your trackpad or the controls at the top right of the figure. You should see that the data is an ellipsoid that looks roughly like a surfboard or a hashbrown patty (https://www.google.com/search? q=hashbrown+patty&source=lnms&tbm=isch). That is, it is pretty long in one direction, pretty wide in another direction, and relatively thin along its third dimension. We can think of these as the "length", "width", and "thickness" of the surfboard data. Observe that the surfboard is not aligned with the x/y/z axes. If you get an error that your browser does not support webgl, you may need to restart your kernel and/or browser. [Show More]
Last updated: 2 years ago
Preview 1 out of 30 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
May 01, 2021
Number of pages
30
Written in
This document has been written for:
Uploaded
May 01, 2021
Downloads
0
Views
95
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·