when might overfitting occurs
- when the # of factors are close to or larger than the # of data points causing the model
to potentially fit too closely to random effects
Why are simple models better than complex ones
...
when might overfitting occurs
- when the # of factors are close to or larger than the # of data points causing the model
to potentially fit too closely to random effects
Why are simple models better than complex ones
- less data is required; less chance of insignificant factors and easier to interpret
what is forward selection
- we select the best new factor and see if it's good enough (R^2, AIC, or p-value) add it
to our model and fit the model with the current set of factors. Then at the end we
remove factors that are lower than a certain threshold
what is backward elimination
- we start with all factors and find the worst on a supplied threshold (p = 0.15). If it is
worse we remove it and start the process over. We do that until we have the number of
factors that we want and then we move the factors lower than a second threshold (p =
.05) and fit the model with all set of factors
what is stepwise regression
- it is a combination of forward selection and backward elimination. We can either start
with all factors or no factors and at each step we remove or add a factor. As we go
through the procedure after adding each new factor and at the end we eliminate right
away factors that no longer appear.
what type of algorithms are stepwise selection?
- Greedy algorithms - at each step they take one thing that looks best
what is LASSO
- a variable selection method where the coefficients are determined by both minimizing
the squared error and the sum of their absolute value not being over a certain threshold
How do you choose t in LASSO
- use the lasso approach with different values of t and see which gives the best trade off
why do we have to scale the data for LASSO
-if we don't the measure of the data will artificially affect how big the coefficients need to
be
[Show More]