What are the main credit scoring model approaches and how does logistic regression compare to machine learning methods?
I'm studying credit risk for FRM Part II and trying to understand the different credit scoring methodologies. The curriculum mentions traditional statistical models and newer ML approaches. When would you choose one over the other, and what are the regulatory implications?
Credit scoring models assign a numerical score representing the probability that a borrower will default. There are several major approaches, each with distinct trade-offs:
1. Logistic Regression (Traditional Workhorse)
This is the most widely used model in banking. It estimates the probability of default (PD) as:
PD = 1 / (1 + e^(-z))
where z = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ
Advantages: Transparent, easy to explain to regulators, coefficients have direct interpretation (e.g., each $10,000 increase in income reduces log-odds of default by 0.15).
2. Decision Trees / Random Forests
These partition the feature space into regions. A random forest aggregates hundreds of trees to reduce overfitting.
3. Neural Networks / Gradient Boosting
These capture complex nonlinear interactions but act as "black boxes."
Practical Example: Pinnacle Bank builds two PD models for its SME portfolio. The logistic regression achieves 72% AUC (area under ROC curve), while a gradient boosting model achieves 81% AUC. However, the regulator requires model explainability under SR 11-7 guidance. Pinnacle uses the logistic model for regulatory capital and the ML model internally for screening.
Key Exam Takeaway: For FRM, understand that model validation metrics (AUC, Gini coefficient, accuracy ratio) are critical for comparing models, and that regulators generally prefer interpretable models for capital calculations.
Check out our FRM Part II Credit Risk course for more on model validation techniques.
Master Part II with our FRM Course
64 lessons · 120+ hours· Expert instruction
Related Questions
Why is DV01 so much smaller than dollar duration if both are supposed to measure rate risk?
When should I stop using modified duration and switch to effective duration?
How should I think about the relationship between Macaulay duration and modified duration instead of memorizing two separate definitions?
Why do hedge calculations often use dollar duration or DV01 instead of just modified duration?
When should I prefer historical simulation VaR over delta-normal VaR?
Join the Discussion
Ask questions and get expert answers.