What does R-squared really tell you, and what are its limitations?

Question

AcadiFi · Accepted Answer

R² (coefficient of determination) is the most commonly reported regression statistic, but it's widely misunderstood. Let's get precise about what it does and doesn't tell you.

**What R² Measures:**

R² = 1 - (SSE / SST)

Where:
- **SST** (Total Sum of Squares) = Total variation in Y
- **SSE** (Sum of Squared Errors) = Unexplained variation
- **SSR** (Sum of Squares Regression) = Explained variation
- **SST = SSR + SSE**

R² = SSR / SST = Proportion of Y's variation explained by X

**Example:**
R² = 0.72 means 72% of the variation in the dependent variable is explained by the independent variable(s). The remaining 28% is unexplained.

**When high R² is meaningful:**
- Cross-sectional models with genuine economic relationships (e.g., company size explaining analyst coverage)
- The slope coefficient is statistically significant
- The model passes residual diagnostics

**When high R² is misleading:**

**1. Spurious correlation:**
Regressing US GDP on world population gives R² near 0.99 — both trend upward over time, but there's no causal link. Time-trending variables will always produce high R².

**2. Overfitting:**
Adding more variables to a regression always increases R² (even random noise variables). That's why we also check **Adjusted R²**, which penalizes for additional variables:

Adj R² = 1 - [(1 - R²)(n - 1) / (n - k - 1)]

**3. Non-linear relationships:**
If Y and X have a U-shaped relationship, a linear regression may have low R² even though X strongly predicts Y.

**For simple regression (one X variable):**
R² = r² (the square of the correlation coefficient)

If r = 0.85, then R² = 0.7225 = 72.25%
If r = -0.90, then R² = 0.81 = 81% (R² is always positive)

**Practical interpretation guide:**

| R² Value | Context | Interpretation |
|----------|---------|---------------|
| 0.95+ | Time series macro | Possibly spurious (check for trends) |
| 0.70-0.90 | Stock factor model | Strong explanatory power |
| 0.30-0.50 | Cross-sectional stock returns | Good for noisy financial data |
| 0.05-0.15 | Daily return prediction | Typical — returns are hard to predict |

**Exam tip:** Don't evaluate a model on R² alone. The CFA exam may present a model with high R² but insignificant coefficients, residual patterns, or obvious spurious correlation — you need to recognize these red flags.

Practice regression interpretation in our CFA Level I question bank.

What does R-squared really tell you, and what are its limitations?

Master Level I with our CFA Course

Related Questions

Practice Questions

R² Value	Context	Interpretation
0.95+	Time series macro	Possibly spurious (check for trends)
0.70-0.90	Stock factor model	Strong explanatory power
0.30-0.50	Cross-sectional stock returns	Good for noisy financial data
0.05-0.15	Daily return prediction	Typical — returns are hard to predict