Database Management > QUESTIONS & ANSWERS > COGS 108 - Assignment 4: Data Analysis Important Reminders You must submit this file ( A4_DataAnalys (All)

COGS 108 - Assignment 4: Data Analysis Important Reminders You must submit this file ( A4_DataAnalysis.ipynb ) to TritonED to finish the homework

Document Content and Description Below

Data Analysis Important Reminders You must submit this file ( A4_DataAnalysis.ipynb ) to TritonED to finish the homework. This assignment has hidden tests: tests that are not visible here, but that... will be run on your submitted assignment for grading. This means passing all the tests you can see in the notebook here does not guarantee you have the right answer! In particular many of the tests you can see simply check that the right variable names exist. Hidden tests check the actual values. It is up to you to check the values, and make sure they seem reasonable. A reminder to restart the kernel and re-run the code as a first line check if things seem to go weird. For example, note that some cells can only be run once, because they re-write a variable (for example, your dataframe), and change it in a way that means a second execution will fail. Also, running some cells out of order might change the dataframe in ways that may cause an error, which can be fixed by re-running. In [1]: In [3]: Notes - Assignment Outline Parts 1-6 of this assignment are modeled on being a minimal example of a project notebook. This mimics, and gets you working with, something like what you will need for your final project. Parts 7 & 8 break from the project narrative, and are OPTIONAL (UNGRADED). They serve instead as a couple of quick one-offs to get you working with some other methods that might be useful to incorporate into your project. Setup Data: the responses collected from a survery of the COGS 108 class. There are 417 observations in the data, covering 10 different 'features'. Research Question: Do students in different majors have different heights? Background: Physical height has previously shown to correlate with career choice, and career success. More recently it has been demonstrated that these correlations can actually be explained by height in high school, as opposed to height in adulthood (1). It is currently unclear whether height correlates with choice of major in university. Reference: 1) http://economics.sas.upenn.edu/~apostlew/paper/pdf/short.pdf (http://economics.sas.upenn.edu/~apostlew/paper/pdf/short.pdf) Hypothesis: We hypothesize that there will be a relation between height and chosen major. Part 1: Load & Clean the Data Fixing messy data makes up a large amount of the work of being a Data Scientist. Collecting package metadata: done Solving environment: done ## Package Plan ## environment location: /Users/brianbarry/anaconda3 added / updated specs: - patsy=0.5.1 The following packages will be downloaded: package | build ---------------------------|----------------- conda-4.6.7 | py36_0 1.6 MB openssl-1.1.1b | h1de35cc_0 3.4 MB patsy-0.5.1 | py36_0 376 KB ------------------------------------------------------------ Total: 5.5 MB The following packages will be UPDATED: conda 4.6.4-py36_0 --> 4.6.7-py36_0 openssl 1.1.1a-h1de35cc_0 --> 1.1.1b-h1de35cc_0 patsy 0.5.0-py37_0 --> 0.5.1-py36_0 Downloading and Extracting Packages conda-4.6.7 | 1.6 MB | ##################################### | 100% patsy-0.5.1 | 376 KB | ##################################### | 100% openssl-1.1.1b | 3.4 MB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done # Run this cell to ensure you have the correct version of patsy # You only need to do the installation once # Once you have run it you can comment these two lines so that the cell doesn't execute everytime. import sys !conda install --yes --prefix {sys.prefix} patsy=0.5.1 # Imports - These are all you need for the assignment: do not import additional packages %matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt import patsy import statsmodels.api as sm import scipy.stats as stats from scipy.stats import ttest_ind, chisquare, normaltest # Note: the statsmodels import may print out a 'FutureWarning'. Thats fine. 12345 123456789 10 11 [Show More]

Last updated: 2 years ago

Preview 1 out of 20 pages

Buy Now

Instant download

We Accept:

We Accept
document-preview

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

We Accept

Reviews( 0 )

$7.00

Buy Now

We Accept:

We Accept

Instant download

Can't find what you want? Try our AI powered Search

63
0

Document information


Connected school, study & course


About the document


Uploaded On

Jul 21, 2022

Number of pages

20

Written in

Seller


seller-icon
CourseWorks,Inc

Member since 3 years

9 Documents Sold

Reviews Received
2
0
0
0
0
Additional information

This document has been written for:

Uploaded

Jul 21, 2022

Downloads

 0

Views

 63

Document Keyword Tags

More From CourseWorks,Inc

View all CourseWorks,Inc's documents »

$7.00
What is Scholarfriends

In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Scholarfriends · High quality services·