Maximum Likelihood Estimation

Likelihood Function

The likelihood function (often simply called the likelihood) is the Joint Probability of the observed data viewed as a function of the parameters of the chosen statistical model.

Key idea for MLE:

Likelihood Function (Definition)

If , where are i.i.d. RVs with observations , then

Probability vs. Likelihood

  • Probability is assigning the probability of a data value given distribution, i.e.
  • Likelihood is the probability of a distribution given data values, i.e.

https://www.youtube.com/watch?v=pYxNSUDSFH4&ab_channel=StatQuestwithJoshStarmer

Negative Log Likelihood

First heard from Andrej Karpathy.

We use the log because the probabilities can be very small, so we work with Log Function.

We negative it so the value can be positive on domain 0 to 1.

One super neat trick from the Log Rules is that instead of multiplying everything, we can just add all the logs, i.e. log(a*b*c) = log(a) + log(b) + log(c)

We do this because a*b*c might be an extremely small number, so we perform addition instead.

How likelihood is used

I’m still trying to wrap my head around this. But essentially, you use a series of likelihood updates.

Your prior is your belief distribution. Then, you have new observations. NO, you are getting confused.

The belief distribution refers to a probability distribution over possible outcomes or states, typically representing subjective probabilities based on a person’s knowledge or judgment.

Based on your belief distribution, you update your prior.

  • I’m still confused

Our goal is to find the posterior.

Example:

  • Based what you observe with the measurements, update the position (state) of the dog
  • position is the prior and posterior
  • measurement is used to update the prior. But where’s the likelihood in all of this?

This chapter is it: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/02-Discrete-Bayes.ipynb

  • Likel