Business > Study Notes > ISYE Midterm 2 Notes:Week 8 Variable (All)

ISYE Midterm 2 Notes:Week 8 Variable

Document Content and Description Below

Important to limit the number of factors in the model for 2 reasons: o Overfitting – When the number of factors is close to or larger than the number of data points the model might fit too closely... to random effects o Simplicity – on aggregate simple models are better than complex ones. Using less factors means that less data is required and the is a smaller chance of including insignificant factors. Interpretability is also crucial. Some factors are even illegal to use such as race and gender in addition to factors that are also predictive of these attributes. - Forward Selection: A method of variable selection method where we start with a model containing no factors. At each step individual step, we find the best new factor to add to the model via iteration. When there is no longer another factor that meets quality thresholds, or we reach a max number of factors then we stop iterating and arrive at the final model. - Backward Elimination: This process is the opposite of forward selection as we start with a full model where at each step, we remove insignificant variables until we arrive at a satisfying model. - Stepwise Regression: Combination of both forward selection and backward elimination. There are two types backwards which starts with a full model or forward which starts with the null model. Then implements a hybrid approach of the two adding and selecting variables iteratively to return a satisfying model. - Each of the stepwise approaches are known as greedy algorithms as each decision is made at each step with only enough consideration for the immediate result of the step and not the global state or future steps. At each step takes the one thing that looks like the immediate best decision. Future options are not considered. - Lasso Approach: A more modern optimized approach to variable selection using global optimization. Add a constraint to the standard regression equation which sets a budget on the sum of the models’ coefficients. This constraint in effect limits the size of coefficients thus making our model a lot more of this coefficient size budget to the most important coefficients / variables. All non-important variables will be allotted zero in the coefficient budget which thus leaves them out of the new selection. Since we are implementing a global coefficient budget it is important that we use scaled data as the budget needs to treat the scale of variables the same otherwise magnitude of variables would impact the models budget allotment. o Min ∑ n i=1 (yi – (a0 + a1x1i + a2x2i + … + aixji))2 o S.t. ∑ j i=1 |ai| ≤ T - The lasso approach requires the tuning parameter T of the model to decide the size and quality of variables. - Elastic regression: takes the general same approach as lasso regression however, instead of just constraining just the absolute value of the coefficients, we constrain a combination of the absolute values of the coefficients and their squares. This is the hybrid of ridge and lasso regression which brings with it the advantages of both as well as the bias disadvantages of both. o Min ∑ n i=1 (yi – (a0 + a1x1i + a2x2i + … + aixji))2 o S.t. L* ∑ j i=1 |ai|+ (1-L) * ∑ j i=1 ai 2 ≤ T - Ridge Regression: A special form of Elastic Nets which results from taking out the absolute value within the Elastic Net constraints with the L or lambda value of 1. Ridge Regression is not a variable selection approach per se but can be used in model selection. In ridge the coefficients shrink toward 0 to reduce variance in the estimate instead of reducing completely to zero as with lasso. However, this introduces a given amount of bias as the coefficients that are still very small are still within the model. o Min ∑ n i=1 (yi – (a0 + a1x1i + a2x2i + … + aixji))2 o S.t. ∑ j i=1 ai 2 ≤ T - Greedy methods like forward selection, backwards elimination, and stepwise regression are easy to implement methods which are good for initial data analysis, but often don’t perform well on other data. They all tend to yield a set of variables that fit more to random effects than ideal which all lead to misleadingly high r squared values. When these models are then tested on different outside data they perform poorly. - Lasso and Elastic nets are usually slower and harder to compute than the general step models. However, they tend to give far better results on predictive models. - Elastic Net Advantages: o Variables selection benefits of Lasso o Predictive benefits of Ridge - Elastic Net Disadvantages: o Arbitrarily rules out some correlated variables like Lasso o Underestimates coefficients of very predictive variables like Ridge Regression - Note a True rule of thumb for choosing between them all if you can try one you can probably try all then select the best representation. Week 9 Design of Experiments: - The process of dealing with data collection constraints how to design an experiment in such a way to collect data in a minimal and quick way but still large and deep enough to model on. - Dealing with practical constraints of data collection such as surveying. If a survey is optimized to be demographically representative, then how are we certain than sub combinations are not incorporated into the data? - All in all, there are two important concepts comparison and control. Some factors in order to gain insight need to be compared but on comparable terms - Blocking: a blocking factor can be created to create variation or account for variability via category of another feature. Think variation of price of cars by color controlled for other factors. Then subset out via blocking the type of car ex. Sports car, family van, sedan, etc. This creates variation in the sample with the goal of attributing more overall variation in the model to being explained rather than by chance. - A/B Testing: Design of experiment approach to choosing between 2 alternatives. Put out both alternatives on a smaller scale and test performance of each then determining if either is statistically better or worse than the other. Thus we could actually do the hypothesis testing of the alternatives in real time and halt the test when the difference between the alternatives becomes statistically significant/extreme enough to determine which is better. The following things need to be true to use A/B testing: o We need to be able to collect a lot of data quickly o We need data that is representative of the population o The amount of data must be small compared to the whole population - Factorial Design: answers the question which factors within alternatives are important? Full factorial design would take alternatives, break them into combinations of factors and test them all. Using Anova analysis will all we can determine statistically which factors are important. However, this is only possible when the number of combinations is reasonably small. Instead we can choose a subset of combination which is known as fractional or partial factorial design. A balanced design would test each choice of feature the same number of times and each pair of choices the same number of times. - Independent Factor approach: Tests a subset of combinations and uses regression to estimate effects of features. This is only possible if we have come to believe that the factors are modellable via interaction terms or solely independent. - All these approaches are/can be effective when they are used before modeling and even before collecting data. - Exploration vs. Exploitation: If we are faced with the choice of numerous alternatives, what is the risk that at some point we arrive at the best model but continue to test which in turn creates wasted time, wasted samples, wasted resources. At what point should it be mathematically viable to exploit a known solution even if we are not certain it is the best as continuing to explore solutions would incur far more cost. - Every time we are presented with the opportunity to show an ad (example) we have to strike a balance between the benefits of getting more information along with its cost as well as the tradeoff with immediate value our current ad contains. - Multi-armed Bandit Approach: Dubbed the general approach to the problem of exploitation vs. exploration problem. Suppose we test K alternatives each of which we have no knowledge of so generally we assume they each have equal probabilities. Then we choose and test and alternative and update the probabilities of the other K-1 bandits being best. We then continue this test update our estimates until we arrive at a situation where statistically it is likely we know the best alternative and can abandon testing all together. In this way we are exploiting our knowledge of past information while exploring only additional possibilities according to previously exploited information. Along the way we are getting more reward for picking alternatives that are more likely to be better. We can alter several parameters in the multi-armed bandit approach such as: o Number of tests between recalculating probabilities o How to update probabilities o How to pick an alternative to test based on probabilities and or expected values. o No simple rule but better than running a fixed large number of testes - Overall, the multi-armed bandit approach has no simple rule but is largely better than traditional approaches such as running a fixed large number of tests. The approach is thus worthwhile due to its ability to learn on the fly and create more incremental value in iteration. [Show More]

Last updated: 2 years ago

Preview 1 out of 16 pages

Buy Now

Instant download

We Accept:

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

Report Copyright Violation

Reviews( 0 )

Study Notes

$7.00

Buy Now

We Accept:

Instant download

Can't find what you want? Try our AI powered Search

102

Document information

Connected school, study & course

About the document

Uploaded On

Jul 11, 2022

Number of pages

Written in

Type

Study Notes

Seller

charles

Member since 4 years

10 Documents Sold

Reviews Received

Send Message

Additional information

This document has been written for:

Course

Business

ISYE Midterm 2 Notes:Week 8 Variable

Document Content and Description Below

Reviews( 0 )

Study Notes

$7.00

Document information

Seller

charles

Reviews Received

Additional information

Document Keyword Tags

More From charles

$7

$10

$14

What is Scholarfriends

Get to know us

We are here to help

Follow us on

Useful links

We accept

Courses

Categories

ISYE Midterm 2 Notes:Week 8 Variable

Document Content and Description Below

Reviews( 0 )

Study Notes

$7.00

Document information

Seller

charles

Reviews Received

Additional information

Document Keyword Tags

More From charles

$7

$10

$14

What is Scholarfriends

Get to know us

We are here to help

Follow us on

Useful links

We accept