How do decision trees work for financial classification problems?
CFA Level II covers decision trees as a machine learning technique. I understand they split data into branches, but how exactly does the algorithm decide where to split? And what are the limitations?
Decision trees are intuitive classification (or regression) models that recursively partition data into increasingly homogeneous subsets. Think of it as a series of yes/no questions that narrows down to a prediction.
How splitting works:
At each node, the algorithm tests every possible split on every feature and chooses the split that produces the most homogeneous (pure) child nodes. Purity is measured by:
- Entropy: H = -Σ pᵢ log₂(pᵢ) — measures disorder. Lower = purer.
- Gini impurity: G = 1 - Σ pᵢ² — probability of misclassification. Lower = purer.
- Information gain: Reduction in entropy from a split.
Example — Credit Approval Tree:
The tree first splits on income (most informative), then on debt-to-income, then on credit score — at each step choosing the variable that best separates approvals from rejections.
Advantages:
- Highly interpretable — you can explain the decision path
- Handles non-linear relationships naturally
- Works with both numerical and categorical features
- No need to scale or normalize data
Disadvantages:
- Prone to overfitting — deep trees memorize training data noise
- Unstable — small data changes can produce completely different trees
- Biased toward features with many levels — features with more unique values get more splitting opportunities
- Typically lower accuracy than ensemble methods
Financial applications:
- Credit approval/denial decisions
- Fraud detection (transaction flagging)
- Customer churn prediction (which clients will leave?)
- Stock classification (buy/hold/sell based on fundamentals)
Controlling overfitting:
- Set a maximum tree depth
- Require a minimum number of samples per leaf
- Prune branches that don't improve out-of-sample accuracy
- Use ensemble methods (random forests) instead
Exam tip: CFA Level II tests the conceptual understanding — how trees split, why they overfit, and when ensembles are preferred. You won't need to calculate entropy by hand, but understand the concept of information gain.
Explore machine learning for finance on AcadiFi.
Master Level II with our CFA Course
107 lessons · 200+ hours· Expert instruction
Related Questions
How do I map a CFA Ethics vignette to the right standard?
When does a duty to clients override pressure from an employer?
Do conflicts have to be disclosed before making a recommendation?
Why do CFA Ethics answers focus so much on the action taken?
What does a high-water mark actually do in a hedge fund fee calculation?
Join the Discussion
Ask questions and get expert answers.