Likelihood Estimation

Maximum Likelihood Estimation (MLE)

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data.

https://www.youtube.com/watch?v=XepXtl9YKwc&ab_channel=StatQuestwithJoshStarmer

title: Definition (The Maximum Likelihood Estimate (MLE))  
$\widehat{θ}$ is the MLE if $\widehat{θ}$ maximizes $L(θ; y_1, y_2, \dots y_n)$ where $L$ is the [[Likelihood Function]].

General Template for Deriving MLE

We always use the log likelihood since it makes it much easier to derive. See Logarithm Rules, but basically you have that

You then take derivative, and set that to 0, since you want to maximize it.

We model binomial with bernouilli, so binomial is a special case. We just do

But for the other distributions, we state for example. So I am a little confused on this?

We define

  • as the likelihood
  • as the log likelihood

Binomial MLE

Suppose , with observed successes. Then what is ?

Likelihood Let’s derive by using the log likelihood. Maximizing log likelihood is the same as maximizing likelihood. maximizes maximizes

  • For the Binomial Distribution, the parameter is simply the sample proportion of success , which intuitively should make sense.

Poisson MLE

Let with observations . What is the MLE of ?

  • Remember for Poisson Distribution the parameter is , and , so the parameter is simply the mean.

I got practice deriving this, and it seems that

Exponential MLE

  • Remember that if , then , so the parameter

Normal MLE

Suppose with observations/data of

What is the MLE of and ? Am I supposed to use the or the ?? I supposed because that we are estimating the variance of a sample, or a population? We are estimating the variance of a population.

Derivation for Normal MLE

We use the definition of Likelihood:

&={\frac {1}{\sigma^n (2\pi)^\frac{n}{2} }}e^{-{\frac {1}{2\sigma^2}}\sum \left({y_i-\mu }\right)^{2}} \\ &={\frac {1}{\sigma^n} \cdot \frac{1}{(2\pi)^\frac{n}{2} }} \cdot e^{-{\frac {1}{2\sigma^2}}\sum \left({y_i-\mu }\right)^{2}} \\ l(u, \sigma^2) &=- n \log \sigma - \frac{n}{2} \log 2\pi - \frac{1}{2\sigma^2} \sum(y_i - \mu)^2 \end{align}$$ This likelihood is maximized when the derivative is 0 (similar ideas in [[notes/Least Squares|Least Squares]]). ### Properties of the [[notes/Maximum Likelihood Estimation|MLE]] For discrete -> $L(\theta)$ the probability of observing $\theta$ For continuous - > recall [[notes/Probability Mass Function|p.m.f.]] = gives probability directly -> [[notes/Probability Density Function|p.d.f.]] -> $f(y_i)$ is not a probability 1. Consistency - As $n \rightarrow \infty$, $\widehat{\theta} \rightarrow \theta$ (our estimate converges to the true value) 2. Efficiency - We want a minimum variance when finding $\widehat{\theta}_i$ 3. [[notes/Invariance|Invariance]] - If $\widehat{\theta}$ is the MLE of $\theta$, then $g(\widehat{\theta})$ is the MLE of $g(\theta)$ Other Notes - We assume that the class of the distribution has been properly identified - We assume that we have [[notes/independent and Identically Distributed|i.i.d]] datasets ### Related - [[notes/Likelihood Function|Likelihood Function]] - [[notes/Relative Likelihood Function|Relative Likelihood Function]]