Maximum Likelihood Estimation

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data.

title: Definition (The Maximum Likelihood Estimate (MLE))  
$\widehat{θ}$ is the MLE if $\widehat{θ}$ maximizes $L(θ; y_1, y_2, \dots y_n)$ where $L$ is the [[Likelihood Function]].

General Template for Deriving MLE

We always use the log likelihood since it makes it much easier to derive. See Logarithm Rules, but basically you have that

You then take derivative, and set that to 0, since you want to maximize.

We model binomial with bernouilli, so binomial is a special case. We just do

But for the other distributions, we state for example. So I am a little confused on this?

Binomial MLE

Suppose , with observed successes. Then what is ?

Likelihood Let’s derive by using the log likelihood. Maximizing log likelihood is the same as maximizing likelihood. maximizes maximizes

  • For the Binomial Distribution, the parameter is simply the sample proportion of success , which intuitively should make sense.

Poisson MLE

Let with observations . What is the MLE of ?

  • Remember for Poisson Distribution the parameter is , and , so the parameter is simply the mean.

I got practice deriving this, and it seems that

Exponential MLE

  • Remember that if , then , so the parameter

Normal MLE

Suppose with observations/data of

What is the MLE of and ? Am I supposed to use the or the ?? I supposed because that we are estimating the variance of a sample, or a population? We are estimating the variance of a population.

Properties of the MLE

For discrete the probability of observing For continuous - > recall p.m.f. = gives probability directly p.d.f. is not a probability

  1. Consistency
    • As , (our estimate converges to the true value)
  2. Efficiency
    • We want a minimum variance when finding
  3. Invariance
    • If is the MLE of , then is the MLE of

Other Notes

  • We assume that the class of the distribution has been properly identified
  • We assume that we have i.i.d datasets