Maximum Likelihood Estimation

Likelihood Function

The likelihood function (often simply called the likelihood) is the Joint Probability of the observed data viewed as a function of the parameters of the chosen statistical model.

Key idea for MLE:

Likelihood Function (Definition)

If , where are i.i.d. RVs with observations , then

Probability vs. Likelihood

• Probability is assigning the probability of a data value given distribution, i.e.
• Likelihood is the probability of a distribution given data values, i.e.

Negative Log Likelihood

First heard from Andrej Karpathy.

We use the log because the probabilities can be very small, so we work with Log Function.

We negative it so the value can be positive on domain 0 to 1.

One super neat trick from the Log Rules is that instead of multiplying everything, we can just add all the logs, i.e. log(a*b*c) = log(a) + log(b) + log(c)

We do this because a*b*c might be an extremely small number, so we perform addition instead.

How likelihood is used

I’m still trying to wrap my head around this. But essentially, you use a series of likelihood updates.

Your prior is your belief distribution. Then, you have new observations. NO, you are getting confused.

The belief distribution refers to a probability distribution over possible outcomes or states, typically representing subjective probabilities based on a person’s knowledge or judgment.