Maximum Likelihood Estimation (MLE)

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data.

https://www.youtube.com/watch?v=XepXtl9YKwc&ab_channel=StatQuestwithJoshStarmer

Definition (The Maximum Likelihood Estimate (MLE))

$θ$ is the MLE if $θ$ maximizes $L (θ; y_{1}, y_{2}, \dots y_{n})$ where $L$ is the Likelihood Function.

General Template for Deriving MLE

We always use the log likelihood since it makes it much easier to derive. See Logarithm Rules, but basically you have that $ln (ab c) = ln a + ln b + ln c$

You then take derivative, and set that to 0, since you want to maximize it.

We model binomial with bernouilli, so binomial is a special case. We just do $X \sim B in (n, p)$

But for the other distributions, we state $X_{1}, \dots, X_{n} \sim E x p (θ)$ for example. So I am a little confused on this?

We define

$L (θ)$ as the likelihood
$l (θ)$ as the log likelihood

Binomial MLE

Suppose $Y \sim B in (n, θ)$ , with $y$ observed successes. Then what is $θ (y)$ ?

Likelihood $L (y; θ) = P (Y = y) = (y n) θ^{y} (1 - θ)^{n - y}$ Let’s derive $θ (y)$ by using the log likelihood. Maximizing log likelihood is the same as maximizing likelihood. $l (θ) = ln L (θ)$ $θ$ maximizes $L (θ) ⟺ θ$ maximizes $l (θ)$ $l (θ) = ln k + y ln θ + (n - y) ln (1 - θ)$ $\frac{d l ( θ )}{d θ} = 0 ⟹ \frac{y}{θ} - \frac{n - y}{1 - θ} = 0$ $⟹ θ = \frac{y}{n}$

For the Binomial Distribution, the parameter is simply the sample proportion of success $p = \frac{# observed successes}{# total trials}$ , which intuitively should make sense.

Poisson MLE

Let $Y_{1}, Y_{2}, \dots, Y_{n} \sim P o i (θ)$ with observations ${y_{1}, \dots, y_{n}}$ . What is the MLE of $θ$ ? $θ = \frac{1}{n} \sum y_{i} = \overline{y}$

Remember for Poisson Distribution the parameter is $λ$ , and $E (y) = λ$ , so the parameter is simply the mean.

I got practice deriving this, and it seems that

Exponential MLE

$λ = \frac{1}{y}$

Remember that if $X \sim E x p (λ)$ , then $E (X) = \frac{1}{λ} = μ$ , so the parameter $λ = \frac{1}{μ}$

Normal MLE

Suppose $Y_{1}, \dots, Y_{n} \sim N (μ, σ^{2})$ with observations/data of ${y_{1}, \dots y_{n}}$

What is the MLE of $μ$ and $σ^{2}$ ? $μ = \overline{y}$ $σ^{2} = \frac{1}{n} \sum (y_{i} - \overline{y})^{2}$ $s^{2} = \frac{1}{n - 1} \sum (y_{i} - \overline{y})^{2}$ Am I supposed to use the $n - 1$ or the $n$ ?? I supposed $n - 1$ because that we are estimating the variance of a sample, or a population? We are estimating the variance of a population.

Derivation for Normal MLE

We use the definition of Likelihood:

&={\frac {1}{\sigma^n (2\pi)^\frac{n}{2} }}e^{-{\frac {1}{2\sigma^2}}\sum \left({y_i-\mu }\right)^{2}} \\ &={\frac {1}{\sigma^n} \cdot \frac{1}{(2\pi)^\frac{n}{2} }} \cdot e^{-{\frac {1}{2\sigma^2}}\sum \left({y_i-\mu }\right)^{2}} \\ l(u, \sigma^2) &=- n \log \sigma - \frac{n}{2} \log 2\pi - \frac{1}{2\sigma^2} \sum(y_i - \mu)^2 \end{align}$$ This likelihood is maximized when the derivative is 0 (similar ideas in [[notes/Least Squares|Least Squares]]). ### Properties of the [[notes/Maximum Likelihood Estimation|MLE]] For discrete -> $L(\theta)$ the probability of observing $\theta$ For continuous - > recall [[notes/Probability Mass Function|p.m.f.]] = gives probability directly -> [[notes/Probability Density Function|p.d.f.]] -> $f(y_i)$ is not a probability 1. Consistency - As $n \rightarrow \infty$, $\widehat{\theta} \rightarrow \theta$ (our estimate converges to the true value) 2. Efficiency - We want a minimum variance when finding $\widehat{\theta}_i$ 3. [[notes/Invariance|Invariance]] - If $\widehat{\theta}$ is the MLE of $\theta$, then $g(\widehat{\theta})$ is the MLE of $g(\theta)$ Other Notes - We assume that the class of the distribution has been properly identified - We assume that we have [[notes/independent and Identically Distributed|i.i.d]] datasets ### Related - [[notes/Likelihood Function|Likelihood Function]] - [[notes/Relative Likelihood Function|Relative Likelihood Function]]

🛠️ Steven Gong

Table of Contents

Maximum Likelihood Estimation (MLE)

Binomial MLE

Poisson MLE

Exponential MLE

Normal MLE

Derivation for Normal MLE

Graph View

Backlinks