Lecture 05

In this lecture we studied maximum likelihood inference for linear classifiers. We saw that ordinary least squares regression can be phrased as maximum likelihood inference under the assumption of additive Gaussian noise. We then derived the closed-form solution to the normal equations for small-scale problems and discussed alternative optimization methods, namely gradient descent, for larger-scale settings. We noted several potential issues in directly applying linear regression to classification problems and explored logistic regression as an alternative. Maximum likelihood inference for logistic regression led us to first- and second-order gradient descent methods for parameter inference. See Chapter 3 of Bishop and Chapter 3 of Hastie for reference.

A few notes on linear and logistic regression:

Ordinary least squares regression is, in principle, easily solved by inverting the normal equations:
$$
\hat{w} = (X^T X)^{-1} X^T y .
$$
In practice, however, it often computationally expensive to invert $ X^T X $ for models with many features, even with specialized numerical methods for doing so.
Gradient descent offers an alternative solution to the normal equations, replacing potentially expensive matrix inversion with an iterative method where we update parameters by moving in the direction of steepest increase of the likelihood landscape:
$$
\hat{w} \leftarrow \hat{w} + \eta X^T (y – X\hat{w}),
$$
where $ \eta $ is a tunable step size. Choosing $ \eta $ too small leads to slow convergence, whereas too large a step size may result in undesirable oscillations about local optima. Intuitively, gradient descent updates each component of $ \hat{w} $ by a sum of the corresponding feature values over all examples, where examples are weighted by the error between actual and predicted labels.
Two high-level issues arise when directly applying ordinary least squares regression to classification problems. First, our model predicts continuous outputs while our training labels are discrete. Second, squared loss is sensitive to outliers and penalizes “obviously correct” predictions for which the predicted value $ \hat{y} $ is much larger than the observed value $ y $ but of the correct sign.
Logistic regression addresses these issues by modeling the class-conditional probabilities directly, using a logistic function to transform predictions to lie in the unit interval:
$$
p(y=1|x, w) = {1 \over 1 + e^{-w \cdot x}}
$$
While maximum likelihood inference for logistic regression does not permit a closed-form solution, gradient descent results in the following update equations:
$$
\hat{w} \leftarrow \hat{w} + \eta X^T (y – p).
$$
In smaller-scale settings one can improve on these updates by using second-order methods such as Newton-Raphson that leverage the local curvature of the likelihood landscape to determine the step size at each iteration.

Naive Bayes, linear regression, and logistic regression are all generalized linear models, meaning that predictions $ \hat{y} $ are simply transformations of a weighted sum of the features $ x $ for some weights $ w $, i.e. $ \hat{y}(x; w) = f(w \cdot x) $. Linear regression models the response directly, whereas Naive Bayes and logistic regression model the probability of categorical outcomes via the logistic link function. Model fitting amounts to inferring the “best” values for the weights $ w $ from training data; each of these methods quantify the notion of “best” differently and thus results in different estimates for $ w $. In particular, Naive Bayes estimates each component of the weight vector independently, while linear and logistic regression account for correlations amongst features.

In the second half of class we discussed APIs for accessing data from web services. As an example, we used Python’s urllib2 and json modules to interact with the New York Times Developer API. See the github repository for more details.

Lecture 05

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112