How do AIC and BIC work for model selection, and when would they disagree?

Question

AcadiFi · Accepted Answer

AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are model selection tools that balance goodness-of-fit against model complexity. Both prevent overfitting by penalizing extra parameters.

**The Formulas:**

AIC = -2 x ln(L) + 2k
BIC = -2 x ln(L) + k x ln(n)

Where:
- **L** = maximized likelihood of the model
- **k** = number of estimated parameters
- **n** = number of observations
- **Lower values are better** for both

**Key Difference — Penalty Strength:**

BIC's penalty is k x ln(n), while AIC's is 2k. When n > 7 (since ln(7) = 1.95 ~ 2), BIC penalizes complexity MORE heavily. For typical financial datasets (n = 250+ daily observations), BIC is far stricter:

| n | AIC Penalty per Param | BIC Penalty per Param |
|---|----------------------|----------------------|
| 50 | 2.0 | 3.91 |
| 250 | 2.0 | 5.52 |
| 1000 | 2.0 | 6.91 |

**Example — Ridgeport Quant Research:**
Compare two GARCH models for S&P 500 volatility (n = 1,000 daily returns):

| Model | Parameters (k) | Log-Likelihood | AIC | BIC |
|-------|---------------|---------------|-----|-----|
| GARCH(1,1) | 3 | -1,425.3 | 2,856.6 | 2,877.4 |
| GARCH(2,2) | 5 | -1,423.1 | 2,856.2 | 2,891.7 |

- AIC prefers GARCH(2,2) (2,856.2 < 2,856.6) — the small fit improvement justifies extra params
- BIC prefers GARCH(1,1) (2,877.4 < 2,891.7) — the heavier penalty rejects the extra complexity

**When They Disagree:**

This happens precisely when a more complex model offers a modest improvement in fit:
- AIC (lighter penalty) accepts the extra parameters
- BIC (heavier penalty) rejects them

In practice, BIC is **consistent** (it selects the true model if one exists in the candidate set), while AIC tends to select models with better **prediction** accuracy. For risk management, BIC is often preferred because overfitting is dangerous — an overfit VaR model will fail precisely when you need it most.

**FRM Exam Tips:**
- Both criteria: lower = better
- BIC penalizes more harshly for large samples
- If asked which favors parsimony: BIC
- Adjusted R-squared only works for nested linear models; AIC/BIC work for any MLE-based model

Test your model selection skills in our FRM Part I question bank.

How do AIC and BIC work for model selection, and when would they disagree?

Master Part I with our FRM Course

Related Questions

Practice Questions

Model	Parameters (k)	Log-Likelihood	AIC	BIC
GARCH(1,1)	3	-1,425.3	2,856.6	2,877.4
GARCH(2,2)	5	-1,423.1	2,856.2	2,891.7