What does R-squared really tell you, and what are its limitations?
CFA Level I regression section. I know R² measures 'goodness of fit' and ranges from 0 to 1, but when is a high R² meaningful vs. misleading? Can a model with R² = 0.95 still be useless?
R² (coefficient of determination) is the most commonly reported regression statistic, but it's widely misunderstood. Let's get precise about what it does and doesn't tell you.
What R² Measures:
R² = 1 - (SSE / SST)
Where:
- SST (Total Sum of Squares) = Total variation in Y
- SSE (Sum of Squared Errors) = Unexplained variation
- SSR (Sum of Squares Regression) = Explained variation
- SST = SSR + SSE
R² = SSR / SST = Proportion of Y's variation explained by X
Example:
R² = 0.72 means 72% of the variation in the dependent variable is explained by the independent variable(s). The remaining 28% is unexplained.
When high R² is meaningful:
- Cross-sectional models with genuine economic relationships (e.g., company size explaining analyst coverage)
- The slope coefficient is statistically significant
- The model passes residual diagnostics
When high R² is misleading:
1. Spurious correlation:
Regressing US GDP on world population gives R² near 0.99 — both trend upward over time, but there's no causal link. Time-trending variables will always produce high R².
2. Overfitting:
Adding more variables to a regression always increases R² (even random noise variables). That's why we also check Adjusted R², which penalizes for additional variables:
Adj R² = 1 - [(1 - R²)(n - 1) / (n - k - 1)]
3. Non-linear relationships:
If Y and X have a U-shaped relationship, a linear regression may have low R² even though X strongly predicts Y.
For simple regression (one X variable):
R² = r² (the square of the correlation coefficient)
If r = 0.85, then R² = 0.7225 = 72.25%
If r = -0.90, then R² = 0.81 = 81% (R² is always positive)
Practical interpretation guide:
| R² Value | Context | Interpretation |
|---|---|---|
| 0.95+ | Time series macro | Possibly spurious (check for trends) |
| 0.70-0.90 | Stock factor model | Strong explanatory power |
| 0.30-0.50 | Cross-sectional stock returns | Good for noisy financial data |
| 0.05-0.15 | Daily return prediction | Typical — returns are hard to predict |
Exam tip: Don't evaluate a model on R² alone. The CFA exam may present a model with high R² but insignificant coefficients, residual patterns, or obvious spurious correlation — you need to recognize these red flags.
Practice regression interpretation in our CFA Level I question bank.
Master Level I with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
How do I map a CFA Ethics vignette to the right standard?
When does a duty to clients override pressure from an employer?
Do conflicts have to be disclosed before making a recommendation?
Why do CFA Ethics answers focus so much on the action taken?
What does a high-water mark actually do in a hedge fund fee calculation?
Join the Discussion
Ask questions and get expert answers.