Model Comparison Evidence
Risk engine benchmark results comparing GBM, GARCH, and Historical simulation models. Supports Tests 2.1, 7.1, 7.2, 7.3a, and 7.3b. For term definitions, see the Glossary. For inputs and timestamps, see Data Provenance.
All results from production risk engine March 2026 release, 2026-03-26/27.
Three-Model Architecture
The risk engine supports three simulation approaches via the same /loan_risk endpoint. All three use the same loan state machine (nominal, partial liquidation, closeout) — they differ only in how price paths are generated.
| Model | Price Path Generation | Best For |
|---|---|---|
| GBM | Geometric Brownian Motion with Cholesky-correlated paths and optional volatility shock multiplier | Stress testing — tail events scaled by volatility multiplier |
| GARCH | Beta-GARCH with market-factor residual modeling, volatility clustering, BIC-selected lag structure | Volatility clustering — captures periods where high volatility persists and produces fatter tails than constant-volatility models |
| Historical | Non-parametric sliding window replay of actual price history | Captures historical price movements and crashes without distributional assumptions |
Scoring from Simulation Output
The Extreme Event Resilience score decreases proportionally with Expected Shortfall from the lender's perspective — higher ES means greater potential capital impairment for the lender, not just borrower liquidation. A score of 1.00 means negligible lender loss. A score of 0.50 means the lender's tail-risk Expected Shortfall is 50% of the loan principal. This is distinct from Pr(Liquidation) — a pool can have high liquidation probability but near-zero lender loss if collateral ratios provide sufficient buffer.
Model Comparison: wstETH/USDC (Tests 7.1, 7.2, 7.3a)
Configuration
See Data Provenance — Model Comparison Input for full JSON payloads, Parameter Mapping for L/N to LTV conversion, and execution timestamps.
| Parameter | GBM / GARCH | Historical |
|---|---|---|
| Collateral | wstETH | wstETH |
| Loan | USDC | USDC |
| LTV | 66.7% | 66.7% |
| Liquidation LTV | 87.0% | 87.0% |
| Loan duration | 365 days | 365 days |
| MC iterations | 10,000 | N/A (sliding window) |
| Volatility stress | volatility_shock = 9.0 → scales vol by (1 + 9.0) = 10x | N/A |
| Lookback | 365 days | 730 days (~366 daily windows) |
Simulation Outcomes
| Metric | GBM (stressed) | GBM (no shock) | GARCH | Historical |
|---|---|---|---|---|
| Pr(Liquidation) | >99% | 68.8% | 93.9% | 98.9% |
| Avg Lender Loss | <1% | <1% | <1% | <1% |
Expected Shortfall by Quantile
| Quantile | GBM (stressed) | GBM (no shock) | GARCH | Historical |
|---|---|---|---|---|
| LossES @99% | 28.65% | <1% | <1% | <1% |
| LossES @99.9% | 53.21% | <1% | <1% | <1% |
| LossES @99.99% | 64.81% | <1% | <1% | <1% |
Interpretation
GBM with volatility stress is the most conservative model by design. The stress multiplier scales the historical volatility estimates, producing simulated return distributions with significantly wider dispersion than historically observed. Under this stress, even well-collateralized positions face closeout, producing significant lender losses (64.8% ES at 99.99th percentile).
GBM without volatility stress produces near-zero lender loss — same as GARCH and Historical. This confirms that the volatility stress is the sole driver of extreme tail loss. Without it, GBM behaves comparably to the other models for this collateral configuration.
GARCH captures volatility clustering dynamics. The Beta-GARCH model produces fatter tails than constant-volatility GBM. Under these dynamics, 93.9% of simulations trigger liquidation events, but collateral ratios remain adequate and lender losses stay <1%.
Historical replays actual market conditions. The sliding window replay uses a 730-day lookback (2 years of daily price data), producing ~366 overlapping windows of 365-day loan simulations. 98.9% of windows triggered liquidation events. Worst-case CCR across all windows was 1.014 — lender losses remained <1%.
Stress Event Sensitivity: October 10th wstETH
To demonstrate the Historical model's sensitivity to specific stress events, we ran two simulations with identical parameters (wstETH/USDC, LTV=66.7%, Liquidation LTV=87.0%, 730-day lookback) but different analysis dates. For input payloads, see Data Provenance — Historical Stress Event.
| Simulation | Analysis Date | Lookback Window | Pr(Liquidation) | Worst CCR | Lender ES |
|---|---|---|---|---|---|
| Includes Oct 10 | 2026-03-27 | 2024-03-28 → 2026-03-27 | 98.9% | 1.014 | <1% |
| Excludes Oct 10 | 2025-10-09 | 2023-10-10 → 2025-10-09 | 63.7% | 1.014 | <1% |
Including the period around October 10th increases liquidation probability from 63.7% to 98.9%. The jump reflects not a single event but a broader period of poor market conditions — empirically verified through sustained price deterioration and elevated volatility across that window. The sliding windows that overlap with this stress sequence nearly all trigger liquidation events. The collateral ratios hold in both cases (worst CCR stays above 1.0), but the model clearly captures how prolonged market deterioration compounds liquidation risk. This confirms the Historical model responds to real market dynamics without distributional assumptions — its output changes when the data changes.
GBM without stress vs GARCH: GBM without volatility stress shows 68.8% liquidation probability, while GARCH shows 93.9%. GARCH captures volatility clustering that the constant-volatility GBM misses, producing more liquidation events under realistic dynamics. Both produce <1% lender loss at this LTV, so the difference surfaces in liquidation frequency rather than tail loss. The stressed GBM serves a different purpose — it models tail scenarios that lookback-calibrated models would miss because they haven't observed them yet (e.g., multi-sigma events like the 2022 LUNA/UST crash or March 2020 COVID drawdown).
Stress Period Sensitivity: GARCH vs GBM (Test 10.1)
8 simulation runs on wstETH/USDC comparing GBM (no shock) and GARCH with and without the October 10, 2025 crash in the lookback window. Two LTV configurations: production (66.7%) and stressed (80%). For input payloads, see Data Provenance — Stress Sensitivity.
LTV 66.7% (Production: N=1.5, L=1.15)
| Model | Period | Pr(Liquidation) | Lender ES @99.99% | Worst CCR |
|---|---|---|---|---|
| GBM | Includes Oct 10 | 54.4% | <1% | 0.999 |
| GARCH | Includes Oct 10 | 93.1% | <1% | 0.988 |
| GBM | Excludes Oct 10 | 45.0% | <1% | 1.014 |
| GARCH | Excludes Oct 10 | 93.7% | <1% | 0.964 |
LTV 80% (Stressed: N=1.25, L=1.1)
| Model | Period | Pr(Liquidation) | Lender ES @99.99% | Worst CCR |
|---|---|---|---|---|
| GBM | Includes Oct 10 | 74.0% | <1% | 0.961 |
| GARCH | Includes Oct 10 | 96.5% | <1% | 0.942 |
| GBM | Excludes Oct 10 | 66.2% | <1% | 0.945 |
| GARCH | Excludes Oct 10 | 96.6% | <1% | 0.930 |
GARCH produces 71% more liquidation events than GBM at production LTV (93.1% vs 54.4%) when the October crash is in the lookback. The October 10 stress event raises GBM liquidation probability by 9-12 percentage points, while GARCH is less sensitive to the specific event because it already captures volatility clustering from broader market dynamics. Lender ES remains <1% at both LTV configurations because the liquidation mechanism recovers capital — the collateral buffer absorbs the stress. The ES differential materializes only at LTV configurations where the collateral buffer is insufficient, which represents a protocol design failure rather than normal operations.
Multi-Pool GBM Results
Four pools scored with production GBM configuration (stressed, 10k MC, 365-day). For input payloads and per-pool parameters, see Data Provenance — Multi-Pool Input.
| Pool | LTV | Liq LTV | ES @99% | ES @99.9% | ES @99.99% | Score |
|---|---|---|---|---|---|---|
| ETH/USDC | 58.8% | 76.9% | <1% | 43.83% | 62.87% | 0.371 |
| BTC/USDC | 58.8% | 76.9% | <1% | <1% | <1% | 1.000 |
| cbBTC/USDC | 58.8% | 76.9% | <1% | <1% | <1% | 1.000 |
| wstETH/USDC | 66.7% | 87.0% | 34.22% | 60.33% | 69.62% | 0.304 |
Observations:
- BTC and cbBTC both show <1% lender loss at all quantiles. The 58.8% LTV and 76.9% Liquidation LTV provide sufficient buffer — borrowers get liquidated (96.4% and 96.0%), but lenders are fully protected.
- ETH with the same conservative LTV (58.8%) shows moderate tail loss at the extreme quantiles.
- wstETH with higher LTV (66.7%) and tighter Liquidation LTV (87.0%) shows the highest tail loss — less buffer means the stress can overwhelm the liquidation mechanism.
Native vs Wrapped Comparison
Same standardized parameters (LTV=58.8%, Liquidation LTV=76.9%, stressed) applied to all four assets for a clean comparison where only the asset differs. Uses the same input payloads as Multi-Pool with uniform L/N values.
| Asset | ES @99% | ES @99.9% | ES @99.99% | Pr(Liquidation) |
|---|---|---|---|---|
| ETH | <1% | 39.46% | 59.83% | >99% |
| wstETH | <1% | 43.28% | 59.33% | >99% |
| BTC | <1% | <1% | <1% | 96.4% |
| cbBTC | <1% | <1% | <1% | 96.1% |
Key finding: At identical LTV parameters, ETH and wstETH produce nearly identical tail loss (59.83% vs 59.33% at 99.99th). BTC and cbBTC both produce near-zero lender loss. The wrapped versions behave consistently with their native counterparts, confirming the risk engine correctly prices wrapped assets from their own historical price series.
Convergence Test (Test 2.1)
10 parallel GBM runs on ETH/USDC with identical parameters (stressed, LTV=58.8%, Liquidation LTV=76.9%) at each iteration count. For input payload and execution timestamps (10k, 100k, 1M), see Data Provenance — Convergence Input.
Convergence by Quantile — 10k MC iterations
| Quantile | Mean ES | Std Dev | CV |
|---|---|---|---|
| 99th | 0.15 | 0.02 | 11.9% |
| 99.9th | 45.74 | 2.77 | 6.1% |
| 99.99th | 58.19 | 5.50 | 9.5% |
Convergence by Quantile — 100k MC iterations
Same pool and parameters, 10x more simulation paths:
| Quantile | Mean ES | Std Dev | CV |
|---|---|---|---|
| 99th | 0.15 | 0.01 | 5.4% |
| 99.9th | 45.93 | 1.49 | 3.2% |
| 99.99th | 61.02 | 2.34 | 3.8% |
Convergence by Quantile — 1M MC iterations
Same pool and parameters, 100x more simulation paths than 10k.
| Quantile | Mean ES | Std Dev | CV |
|---|---|---|---|
| 99th | 0.15 | 0.001 | 0.78% |
| 99.9th | 45.73 | 0.28 | 0.62% |
| 99.99th | 61.05 | 0.64 | 1.05% |
Convergence Comparison Across Iteration Counts
| Quantile | CV @ 10k | CV @ 100k | CV @ 1M |
|---|---|---|---|
| 99th | 11.9% | 5.4% | 0.78% |
| 99.9th | 6.1% | 3.2% | 0.62% |
| 99.99th | 9.5% | 3.8% | 1.05% |
At 1M iterations, all three quantiles converge under 2% CV. The 99.99th percentile drops from 9.5% (10k) to 1.05% (1M).
Interpretation
The production system currently runs at 10k MC iterations, which achieves 6.1% CV at the 99.9th percentile. At the 99.99th percentile (the scoring design point), CV is 9.5% — above the 2% threshold.
If CV < 2% is a mandatory requirement, the system supports it: at 1M iterations, the 99.99th percentile converges to 1.05% CV. This is a compute cost tradeoff, not a model limitation — the system supports configurable iteration counts and quantiles.
Correlation Test (Test 7.3b)
Two GBM simulations with identical parameters (stressed, LTV=66.7%, Liquidation LTV=87.0%) except collateral composition. For input payloads (correlated and uncorrelated pairs), see Data Provenance — Correlation Test.
| Configuration | Collateral | ES @99% | ES @99.9% | ES @99.99% |
|---|---|---|---|---|
| Correlated pair | wstETH (50%) + cbETH (50%) | 31.88% | 54.42% | 71.18% |
| Uncorrelated pair | BTC (50%) + DAI (50%) | <1% | <1% | <1% |
Correlated ES is significantly higher. The GBM model generates price paths using Cholesky decomposition of the historical covariance matrix. wstETH and cbETH (both ETH liquid staking derivatives) are highly correlated — when one drops, the other drops with it. The diversification benefit is minimal, and the correlated decline can overwhelm the liquidation mechanism.
BTC + DAI (a stablecoin) are effectively uncorrelated. DAI maintains peg while BTC moves, providing genuine diversification. Even under stress, the mixed collateral produces <1% lender loss.