Maximum Likelihood Estimation (MLE)
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data.
https://www.youtube.com/watch?v=XepXtl9YKwc&ab_channel=StatQuestwithJoshStarmer
General Template for Deriving MLE
We always use the log likelihood since it makes it much easier to derive. See Logarithm Rules, but basically you have that
You then take derivative, and set that to 0, since you want to maximize it.
We model binomial with bernouilli, so binomial is a special case. We just do
But for the other distributions, we state for example. So I am a little confused on this?
We define
- as the likelihood
- as the log likelihood
Binomial MLE
Suppose , with observed successes. Then what is ?
Likelihood Let’s derive by using the log likelihood. Maximizing log likelihood is the same as maximizing likelihood. maximizes maximizes
- For the Binomial Distribution, the parameter is simply the sample proportion of success , which intuitively should make sense.
Poisson MLE
Let with observations . What is the MLE of ?
- Remember for Poisson Distribution the parameter is , and , so the parameter is simply the mean.
I got practice deriving this, and it seems that
Exponential MLE
- Remember that if , then , so the parameter
Normal MLE
Suppose with observations/data of
What is the MLE of and ? Am I supposed to use the or the ?? I supposed because that we are estimating the variance of a sample, or a population? We are estimating the variance of a population.
Derivation for Normal MLE
We use the definition of Likelihood:
&={\frac {1}{\sigma^n (2\pi)^\frac{n}{2} }}e^{-{\frac {1}{2\sigma^2}}\sum \left({y_i-\mu }\right)^{2}} \\ &={\frac {1}{\sigma^n} \cdot \frac{1}{(2\pi)^\frac{n}{2} }} \cdot e^{-{\frac {1}{2\sigma^2}}\sum \left({y_i-\mu }\right)^{2}} \\ l(u, \sigma^2) &=- n \log \sigma - \frac{n}{2} \log 2\pi - \frac{1}{2\sigma^2} \sum(y_i - \mu)^2 \end{align}$$ This likelihood is maximized when the derivative is 0 (similar ideas in [[notes/Least Squares|Least Squares]]). ### Properties of the [[notes/Maximum Likelihood Estimation|MLE]] For discrete -> $L(\theta)$ the probability of observing $\theta$ For continuous - > recall [[notes/Probability Mass Function|p.m.f.]] = gives probability directly -> [[notes/Probability Density Function|p.d.f.]] -> $f(y_i)$ is not a probability 1. Consistency - As $n \rightarrow \infty$, $\widehat{\theta} \rightarrow \theta$ (our estimate converges to the true value) 2. Efficiency - We want a minimum variance when finding $\widehat{\theta}_i$ 3. [[notes/Invariance|Invariance]] - If $\widehat{\theta}$ is the MLE of $\theta$, then $g(\widehat{\theta})$ is the MLE of $g(\theta)$ Other Notes - We assume that the class of the distribution has been properly identified - We assume that we have [[notes/independent and Identically Distributed|i.i.d]] datasets ### Related - [[notes/Likelihood Function|Likelihood Function]] - [[notes/Relative Likelihood Function|Relative Likelihood Function]]