Maximum Likelihood Estimation

Likelihood Function

The likelihood function (often simply called the likelihood) is the Joint Probability of the observed data viewed as a function of the parameters of the chosen statistical model.

“Probability of what you see given your model”

So likelihood is just Joint Probability?

When we use the term likelihood, we’re not treating as fixed.

Key idea for MLE:

  • is your data
  • are your model parameters

The other way around is the Posterior.

“It can be shown1 that minimizing the KL Divergence is equivalent to minimizing the Negative Log Likelihood, which is what we usually do when training a classifier, for example.

  • YES I think I finally get that

Likelihood Function (Definition)

If , where are i.i.d. RVs with observations , then

Probability vs. Likelihood

  • Probability is assigning the probability of a data value given distribution, i.e.
  • Likelihood is the probability of a distribution given data values measures how well a distribution explains the data, i.e. ,

Probability treats data as variable, and likelihood treats parameters as variable.

https://www.youtube.com/watch?v=pYxNSUDSFH4&ab_channel=StatQuestwithJoshStarmer