*NURSING > EXAM > statistical Learning and Modeling: Supervised Learning (All)

statistical Learning and Modeling: Supervised Learning

Document Content and Description Below

Statistical Learning and Modeling: Supervised Learning Fei Wu College of Computer Science Zhejiang University http://person.zju.edu.cn/wufei/ Outlines  Linear mo... del for classification  Ada Boosting Linear Model for Classification Learning the parameters of Linear Discriminant Functions • Three approaches: – Least-squares approach: • making the model predictions as close as possible to a set of target values – Fisher‟s linear discriminant: • maximum class separation in the output space – The perceptron algorithm of Rosenblatt: • generalized linear model Linear Basis Function Models Parameter optimization via Maximum likelihood Parameter optimization via Maximum likelihood • Assume: • Thus: • For data set X = {x1, . . . , xN} and target vector t = (t1, . . . , tN)T, the likelihood function: SSE: sum-of-squares error function Parameter optimization via Maximum likelihood • Solving w by Maximum likelihood: ? ? ? = (Φ ? Φ) − 1 Φ ? ? N × M design matrix Moore-Penrose pseudo-inverse Φ † = (Φ ? Φ) − 1 Φ ? Thus the bias w0 compensates for the difference between the averages (over the training set) of the target values and the weighted sum of the averages of the basis function values. Parameter optimization via Maximum likelihood  About bias parameter w0: • Solving the noise precision parameter β by ML: • Problem: Parameter optimization via Least Square – : – group together: – • Learning with training data set: • minimizing a sum-of-squares error function: • Discriminant function: Maximum likelihood and least squares for linear regression classification  Maximum likelihood estimation method (MLE)   The likelihood function indicates how likely the observed sample is as a function of possible parameter values. Therefore, maximizing the likelihood function determines the parameters that are most likely to produce the observed data. From a statistical point of view, MLE is usually recommended for large samples because it is versatile, applicable to most models and different types of data, and produces the most precise estimates.  Least squares estimation method (LSE)   Least squares estimates are calculated by fitting a regression line to the points from a data set that has the minimal sum of the deviations squared (least square error). In reliability analysis, the line and the data are plotted on a probability plot.  In a linear model, if the errors belong to a normal distribution the least squares estimators are also the maximum likelihood estimators.  Fisher‟s linear discriminant • From the view of dimensionality reduction: • The simplest measure of the separation of the classes is the separation of the projected class means: • Problem: we can increase the magnitude of w to make ( ) arbitrarily large! Fisher‟s linear discriminant • The Fisher’s criterion: maximize the separation between the projected class means as well as the inverse of the total within-class variance. Generalized Rayleigh quotient Between-class covariance matrix Within-class covariance [Show More]

Last updated: 2 years ago

Preview 1 out of 59 pages

Buy Now

Instant download

We Accept: