# Likelihood Function

The likelihood function (often simply called the likelihood) is the Joint Probability of the observed data viewed as a function of the parameters of the chosen statistical model.

Key idea for MLE: $Likelihood=P(observing your data as a factor ofθ)$

Likelihood Function (Definition)

If $Y_{i}∼f(y_{i},θ),i=1,2,…,n$, where $Y_{i}$ are i.i.d. RVs with observations ${y_{1},y_{2},…y_{n}}$, then $L(θ;y_{1},y_{2},…y_{n})=P(Y_{1}=y_{1},Y_{2}=y_{2},…,Y_{n}=y_{n})=∏_{i=1}f(y_{i},θ)$

- $f$ is the Probability Density Function
- $θ$ are the parameters we are trying to estimate, depends on the distribution. Ex: for Gaussian Distribution, $θ=(μ,σ_{2})$

Probability vs. Likelihood

Probabilityis assigning the probability of a data value given distribution, i.e. $P(data∣distribution)$Likelihoodis the probability of a distribution given data values, i.e. $L(distribution∣data)$https://www.youtube.com/watch?v=pYxNSUDSFH4&ab_channel=StatQuestwithJoshStarmer

### Negative Log Likelihood

First heard from Andrej Karpathy.

We use the log because the probabilities can be very small, so we work with Log Function.

We negative it so the value can be positive on domain 0 to 1.

One super neat trick from the Log Rules is that instead of multiplying everything, we can just add all the logs, i.e. `log(a*b*c) = log(a) + log(b) + log(c)`

We do this because `a*b*c`

might be an extremely small number, so we perform addition instead.

### How likelihood is used

I’m still trying to wrap my head around this. But essentially, you use a series of likelihood updates.

Your prior is your belief distribution. Then, you have new observations. NO, you are getting confused.

The belief distribution refers to a probability distribution over possible outcomes or states, typically representing subjective probabilities based on a person’s knowledge or judgment.

Based on your belief distribution, you update your prior.

- I’m still confused

Our goal is to find the posterior.

Example:

- Based what you observe with the measurements, update the position (state) of the dog
- position is the prior and posterior
- measurement is used to update the prior. But where’s the likelihood in all of this?

This chapter is it: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/02-Discrete-Bayes.ipynb

- Likel