Lecture 04

Our previous discussion of naive Bayes led us to the problem of overfitting, specifically in dealing with rare words for text classification. We investigated this problem a bit more formally in the context of probabilistic modeling and discussed maximum likelihood, maximum a posteriori, and Bayesian methods for parameter estimation. With a Bernoulli model for word presence in mind, we looked at the toy problem of estimating the bias of a coin from observed flips. We saw that general principles from probabilistic modeling reproduced our intuitive parameter estimates from last week, and, furthermore, justified the hack of smoothing counts to avoid overfitting. See Chapter 2 of Bishop for more details.

Some notes on probabilistic inference:

Under maximum likelihood inference we define the “best” parameter values as those for which the observed data are most probable:
$$
\hat{\theta}_{ML} = \mathrm{argmax}_{\theta} ~ p(D | \theta).
$$
Employing this framework for the coin flipping example reproduces the intuitive relative frequency estimate from last week: $\hat{\theta}_{ML} = {n \over N} $. Unfortunately, as we saw for spam classification, maximum-likelihood estimates can often lead to overfitting (e.g., when n=0 or n=N).
In a subtle but important distinction, maximum a posteriori inference formulates the problem as one of identifying the most probable parameters given the observed data:
$$
\hat{\theta}_{MAP} = \mathrm{argmax}_{\theta} ~ p(\theta | D) = \mathrm{argmax}_{\theta} ~ p(D | \theta) p(\theta),
$$
where the right-hand side follows from an application of Bayes’ rule. Using a beta distribution $ \mathrm{Beta}(\theta; \alpha, \beta) $ for the coin flipping example reproduces the “smoothed” estimate: $\hat{\theta}_{MAP} = {n + \alpha – 1 \over N + \alpha + \beta – 2} $, where $ \alpha $ and $ \beta $ act as pseudocounts for the number of heads and tails seen before any actual data are observed. This allows us to address the overfitting problem by specifying the shape of the prior distribution via $ \alpha $ and $ \beta $. Setting $ \alpha $ and $ \beta $ to 1 corresponds to a uniform prior distribution and yields the maximum likelihood estimate above, while larger values of $ \alpha $ and $ \beta $ bias our estimates more heavily towards our prior distribution.
Bayesian inference dispenses with the idea of point estimates for the parameters all together in favor of keeping full distributions over all unknown quantities:
$$
p(\theta | D) = {p(D | \theta) p(\theta) \over p(D)}
$$
In contrast to MAP estimation, this requires calculation of the normalizing constant $ p(D) $, often referred to as the marginal likelihood or evidence. Fortunately the choice of a conjugate prior distribution reduces the potentially difficult task of calculating the evidence to simple algebra. For example, the posterior distribution for the coin flipping example with a beta prior is simply a beta distribution with updated hyperparameters, $ \mathrm{Beta}(\theta; n + \alpha, N – n + \alpha + \beta) $. Likewise, the predictive distribution $ p(x|D) $, which provides the probability of future outcomes given the observed data, amounts to a simple ratio of beta distributions.

Returning to naive Bayes for text classification, any of the above methods may be used to estimate the many (independent) parameter values in our model. Maximum likelihood estimation has no tunable hyperparameters, whereas MAP estimation and Bayesian inference require the specification of a prior distribution that, in practice, is often tuned for optimal generalization error. Regardless of which method we choose, however, naive Bayes still models features independently. Next we’ll extend our discussion of linear methods for classification to linear and logistic regression, which account for correlations amongst model features.

During the second part of class we looked at a python script to fetch email data from an imap server as well as some shell scripting to parse raw email data, and concluded with an introduction to Python.

Lecture 04

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112