What is maximum likelihood estimation (MLE) and how is it used in risk modeling?
I'm studying FRM Part I Quantitative Analysis and MLE keeps coming up. I understand OLS regression, but MLE seems like a completely different approach to estimation. Can someone explain the intuition, when you'd use it over OLS, and walk through a simple example?
Maximum Likelihood Estimation (MLE) is a parameter estimation method that finds the values most likely to have produced the observed data. Unlike OLS (which minimizes squared errors), MLE maximizes the probability of observing the actual data given a distributional assumption.
The Core Intuition:
Imagine you flip a coin 100 times and get 65 heads. What's the most likely probability of heads? MLE says: try every possible p from 0 to 1, calculate the probability of getting exactly 65 heads in 100 flips for each p, and pick the p that gives the highest probability. The answer is p = 0.65.
Formal Setup:
Given observations x1, x2, ..., xn from a distribution f(x|theta), the likelihood function is:
L(theta) = Product of f(xi|theta) for all i
We typically maximize the log-likelihood (since products become sums):
ln L(theta) = Sum of ln f(xi|theta)
Risk Modeling Example:
Suppose Ashford Risk Analytics models daily portfolio losses as Normal(mu, sigma^2). With 250 observations:
ln L(mu, sigma) = -250/2 x ln(2pi) - 250/2 x ln(sigma^2) - Sum of (xi - mu)^2 / (2sigma^2)
Taking derivatives and setting to zero:
- MLE of mu = sample mean
- MLE of sigma^2 = (1/n) x Sum of (xi - mu)^2 (note: divides by n, not n-1)
When MLE Beats OLS:
- Non-normal distributions: For fat-tailed models (Student-t, GEV), MLE naturally handles the distributional shape
- Binary outcomes: Logistic regression (used in credit scoring) uses MLE because the outcome is 0/1
- GARCH models: Volatility clustering models require MLE since there's no closed-form OLS solution
- Censored/truncated data: MLE can properly handle incomplete observations
Key MLE Properties for FRM:
- Consistent: Converges to the true parameter as sample size grows
- Asymptotically efficient: Achieves the lowest possible variance among consistent estimators
- Asymptotically normal: The MLE distribution approaches Normal for large samples
- Invariant: If theta-hat is MLE of theta, then g(theta-hat) is MLE of g(theta)
Master MLE and other estimation techniques in our FRM Part I question bank.
Master Part I with our FRM Course
64 lessons · 120+ hours· Expert instruction
Related Questions
Why is DV01 so much smaller than dollar duration if both are supposed to measure rate risk?
When should I stop using modified duration and switch to effective duration?
How should I think about the relationship between Macaulay duration and modified duration instead of memorizing two separate definitions?
Why do hedge calculations often use dollar duration or DV01 instead of just modified duration?
When should I prefer historical simulation VaR over delta-normal VaR?
Join the Discussion
Ask questions and get expert answers.