# Maximum Likelihood Estimation

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data.

https://www.youtube.com/watch?v=XepXtl9YKwc&ab_channel=StatQuestwithJoshStarmer

### General Template for Deriving MLE

We always use the log likelihood since it makes it much easier to derive. See Logarithm Rules, but basically you have that $ln(abc)=lna+lnb+lnc$

You then take derivative, and set that to 0, since you want to maximize.

We model binomial with bernouilli, so binomial is a special case. We just do $X∼Bin(n,p)$

But for the other distributions, we state $X_{1},…,X_{n}∼Exp(θ)$ for example. So I am a little confused on this?

#### Binomial MLE

Suppose $Y∼Bin(n,θ)$, with $y$ observed successes. Then what is $θ(y)$?

Likelihood $L(y;θ)=P(Y=y)=(yn )θ_{y}(1−θ)_{n−y}$ Let’s derive $θ(y)$ by using the log likelihood. Maximizing log likelihood is the same as maximizing likelihood. $l(θ)=lnL(θ)$ $θ$ maximizes $L(θ)⟺θ$ maximizes $l(θ)$ $l(θ)=lnk+ylnθ+(n−y)ln(1−θ)$ $dθdl(θ) =0⟹θy −1−θn−y =0$ $⟹θ=ny $

- For the Binomial Distribution, the parameter is simply the sample proportion of success $p=#total trials#observed successes $, which intuitively should make sense.

#### Poisson MLE

Let $Y_{1},Y_{2},…,Y_{n}∼Poi(θ)$ with observations ${y_{1},…,y_{n}}$. What is the MLE of $θ$? $θ=n1 ∑y_{i}=y $

- Remember for Poisson Distribution the parameter is $λ$, and $E(y)=λ$, so the parameter is simply the mean.

I got practice deriving this, and it seems that

#### Exponential MLE

$λ=y 1 $

- Remember that if $X∼Exp(λ)$, then $E(X)=λ1 =μ$, so the parameter $λ=μ1 $

#### Normal MLE

Suppose $Y_{1},…,Y_{n}∼N(μ,σ_{2})$ with observations/data of ${y_{1},…y_{n}}$

What is the MLE of $μ$ and $σ_{2}$? $μ =y $ $σ_{2}=n1 ∑(y_{i}−y )_{2}$ $s_{2}=n−11 ∑(y_{i}−y )_{2}$ Am I supposed to use the $n−1$ or the $n$?? I supposed $n−1$ because that we are estimating the variance of a sample, or a population? We are estimating the variance of a population.

### Properties of the MLE

For discrete → $L(θ)$ the probability of observing $θ$ For continuous - > recall p.m.f. = gives probability directly → p.d.f. → $f(y_{i})$ is not a probability

- Consistency
- As $n→∞$, $θ→θ$ (our estimate converges to the true value)

- Efficiency
- We want a minimum variance when finding $θ_{i}$

- Invariance
- If $θ$ is the MLE of $θ$, then $g(θ)$ is the MLE of $g(θ)$

Other Notes

- We assume that the class of the distribution has been properly identified
- We assume that we have i.i.d datasets