Logistic Regression
Resources
Derivation
We want to use to compute . We will use a Bernoulli model: parameterize probability of label by feature vector , parameter vector :
The parameterization is derived from the logit transform, which equates the log of the odds ratio to the linear score.
This ensures mathematical consistency because both sides range over (all real numbers):
Here, is the odds ratio.
Rearranging this relationship yields the logistic regression equation for :
This equation is most commonly written in the equivalent form, which is defined as the sigmoid function of the linear score :
This probability is constrained to the range . The model makes a prediction if ; otherwise, .
MLE Estimation
We need to find to fit our data. See the pdf for the equation.
We can then update via either Gradient Descent or Root Finding, Newton’s Method,