Maximum Likelihood Estimation

Likelihood Function

The likelihood function (often simply called the likelihood) is the Joint Probability of the observed data viewed as a function of the parameters of the chosen statistical model.

Key idea for MLE:

  • is your data
  • are your model parameters

Likelihood Function (Definition)

If , where are i.i.d. RVs with observations , then

Probability vs. Likelihood

  • Probability is assigning the probability of a data value given distribution, i.e.
  • Likelihood is the probability of a distribution given data values, i.e.

https://www.youtube.com/watch?v=pYxNSUDSFH4&ab_channel=StatQuestwithJoshStarmer

Negative Log Likelihood

First heard from Andrej Karpathy.

Log likelihood:

We use the log because the probabilities can be very small, so we work with Log Function.

We negative it so the value can be positive on domain 0 to 1.

One super neat trick from the Log Rules is that instead of multiplying everything, we can just add all the logs, i.e. log(a*b*c) = log(a) + log(b) + log(c)

We do this because a*b*c might be an extremely small number, so we perform addition instead.

How likelihood is used

I’m still trying to wrap my head around this. But essentially, you use a series of likelihood updates.

Your prior is your belief distribution. Then, you have new observations. NO, you are getting confused.

The belief distribution refers to a probability distribution over possible outcomes or states, typically representing subjective probabilities based on a person’s knowledge or judgment.

Based on your belief distribution, you update your prior.

  • I’m still confused

Our goal is to find the posterior.

Example:

  • Based what you observe with the measurements, update the position (state) of the dog
  • position is the prior and posterior
  • measurement is used to update the prior. But where’s the likelihood in all of this?

This chapter is it: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/02-Discrete-Bayes.ipynb

  • Likel