How is K-means clustering used to group assets for portfolio construction, and what are its limitations with financial return data?

Question

AcadiFi · Accepted Answer

K-means clustering partitions assets into K groups by minimizing within-cluster variance. Instead of relying on subjective sector labels (which mix unrelated companies), clustering uses statistical return patterns to reveal natural groupings.

**Algorithm Steps:**
1. Choose K (number of clusters)
2. Initialize K centroids randomly
3. Assign each asset to the nearest centroid (Euclidean distance in feature space)
4. Recompute centroids as the mean of assigned assets
5. Repeat steps 3-4 until assignments stabilize

**Worked Example:**
Lakefront Asset Management wants to diversify a 50-stock portfolio beyond traditional GICS sectors. They compute 8 features for each stock: 12-month return, 60-day volatility, beta, dividend yield, P/E ratio, debt-to-equity, revenue growth, and earnings stability.

Running K-means with K=6 produces:

| Cluster | Profile | Stocks | Traditional Sectors Mixed |
|---|---|---|---|
| 1 | High-growth, high-vol | 8 | Tech + Biotech + Consumer Discretionary |
| 2 | Stable dividend payers | 11 | Utilities + REITs + Consumer Staples |
| 3 | Cyclical value | 9 | Industrials + Materials + Energy |
| 4 | Defensive low-beta | 7 | Healthcare + Telecom + Utilities |
| 5 | Leveraged growth | 8 | Financials + Tech + Real Estate |
| 6 | Quality compounders | 7 | Tech + Healthcare + Consumer |

Clusters 2 and 4 both contain Utilities stocks — but cluster 2 groups them with REITs based on yield characteristics while cluster 4 groups others with Healthcare based on low-beta behavior. This captures economically meaningful distinctions that sector labels miss.

**Choosing K (Elbow Method):**
Plot within-cluster sum of squares (WCSS) against K. Lakefront tested K=3 to K=12:

- K=3: WCSS=284 (too few, heterogeneous clusters)
- K=6: WCSS=121 (elbow point, clear inflection)
- K=10: WCSS=89 (marginal improvement, overly granular)

K=6 provided the best balance between granularity and statistical stability.

**Limitations with Financial Data:**
- K-means assumes spherical clusters — financial return distributions are often elongated or asymmetric
- Sensitive to outliers (extreme returns distort centroids)
- Clusters may be unstable across time periods — quarterly reclustering often reassigns 20-30% of assets
- Euclidean distance in high dimensions suffers from the curse of dimensionality

**Alternatives:** Hierarchical clustering handles non-spherical shapes; DBSCAN automatically determines K and handles outliers.

Explore clustering applications in our CFA Quantitative Methods question bank.

How is K-means clustering used to group assets for portfolio construction, and what are its limitations with financial return data?

Master Level II with our CFA Course

Related Questions

Practice Questions