Logistic Regression

Resources

http://www.gautamkamath.com/courses/CS480-fa2025-files/lec4.pdf

Derivation

We want to use $⟨ x, w ⟩$ to compute $p (x, w)$ . We will use a Bernoulli model: parameterize probability of label $y$ by feature vector $x$ , parameter vector $w$ :

$Pr [y = 1 ∣ x, w] = p (x, w) \in (0, 1)$
$Pr [y = 0 ∣ x, w] = 1 - p (x, w)$

The parameterization is derived from the logit transform, which equates the log of the odds ratio to the linear score.

This ensures mathematical consistency because both sides range over $R$ (all real numbers):

lo g \frac{p ( x , w )}{1 - p ( x , w )} = ⟨ x, w ⟩

Here, $\frac{p ( x , w )}{1 - p ( x , w )}$ is the odds ratio.

Rearranging this relationship yields the logistic regression equation for $p (x, w)$ :

p (x, w) = \frac{exp (⟨ x , w ⟩)}{1 + exp (⟨ x , w ⟩)}

This equation is most commonly written in the equivalent form, which is defined as the sigmoid function of the linear score $⟨ x, w ⟩$ :

p (x, w) = \frac{1}{1 + exp ( - ⟨ x , w ⟩)} ≜ sigmoid (⟨ x, w ⟩)

This probability $p (x, w)$ is constrained to the range $[0, 1]$ . The model makes a prediction $\overset{y}{^} = 1$ if $p (x, w) > \frac{1}{2}$ ; otherwise, $\overset{y}{^} = 0$ .

MLE Estimation

We need to find $w$ to fit our data. See the pdf for the equation.

We can then update via either Gradient Descent or Root Finding, Newton’s Method,

🛠️ Steven Gong

Table of Contents

Logistic Regression

Derivation

MLE Estimation

Graph View

Backlinks