How do AIC and BIC work for model selection, and when would they disagree?
For FRM Part I, I need to understand information criteria for choosing between competing models. I know AIC and BIC both penalize model complexity, but I'm unclear on the mechanics. Can someone explain the formulas and when they'd pick different models?
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are model selection tools that balance goodness-of-fit against model complexity. Both prevent overfitting by penalizing extra parameters.
The Formulas:
AIC = -2 x ln(L) + 2k
BIC = -2 x ln(L) + k x ln(n)
Where:
- L = maximized likelihood of the model
- k = number of estimated parameters
- n = number of observations
- Lower values are better for both
Key Difference — Penalty Strength:
BIC's penalty is k x ln(n), while AIC's is 2k. When n > 7 (since ln(7) = 1.95 ~ 2), BIC penalizes complexity MORE heavily. For typical financial datasets (n = 250+ daily observations), BIC is far stricter:
| n | AIC Penalty per Param | BIC Penalty per Param |
|---|---|---|
| 50 | 2.0 | 3.91 |
| 250 | 2.0 | 5.52 |
| 1000 | 2.0 | 6.91 |
Example — Ridgeport Quant Research:
Compare two GARCH models for S&P 500 volatility (n = 1,000 daily returns):
| Model | Parameters (k) | Log-Likelihood | AIC | BIC |
|---|---|---|---|---|
| GARCH(1,1) | 3 | -1,425.3 | 2,856.6 | 2,877.4 |
| GARCH(2,2) | 5 | -1,423.1 | 2,856.2 | 2,891.7 |
- AIC prefers GARCH(2,2) (2,856.2 < 2,856.6) — the small fit improvement justifies extra params
- BIC prefers GARCH(1,1) (2,877.4 < 2,891.7) — the heavier penalty rejects the extra complexity
When They Disagree:
This happens precisely when a more complex model offers a modest improvement in fit:
- AIC (lighter penalty) accepts the extra parameters
- BIC (heavier penalty) rejects them
In practice, BIC is consistent (it selects the true model if one exists in the candidate set), while AIC tends to select models with better prediction accuracy. For risk management, BIC is often preferred because overfitting is dangerous — an overfit VaR model will fail precisely when you need it most.
FRM Exam Tips:
- Both criteria: lower = better
- BIC penalizes more harshly for large samples
- If asked which favors parsimony: BIC
- Adjusted R-squared only works for nested linear models; AIC/BIC work for any MLE-based model
Test your model selection skills in our FRM Part I question bank.
Master Part I with our FRM Course
64 lessons · 120+ hours· Expert instruction
Related Questions
Why is DV01 so much smaller than dollar duration if both are supposed to measure rate risk?
When should I stop using modified duration and switch to effective duration?
How should I think about the relationship between Macaulay duration and modified duration instead of memorizing two separate definitions?
Why do hedge calculations often use dollar duration or DV01 instead of just modified duration?
When should I prefer historical simulation VaR over delta-normal VaR?
Join the Discussion
Ask questions and get expert answers.