← Risk Engine Validation & Stress Testing Report

Model Comparison Evidence

Risk engine benchmark results comparing GBM, GARCH, and Historical simulation models. Supports Tests 2.1, 7.1, 7.2, 7.3a, and 7.3b. For term definitions, see the Glossary. For inputs and timestamps, see Data Provenance.

All results from production risk engine March 2026 release, 2026-03-26/27.

Three-Model Architecture

The risk engine supports three simulation approaches via the same /loan_risk endpoint. All three use the same loan state machine (nominal, partial liquidation, closeout) — they differ only in how price paths are generated.

Model	Price Path Generation	Best For
GBM	Geometric Brownian Motion with Cholesky-correlated paths and optional volatility shock multiplier	Stress testing — tail events scaled by volatility multiplier
GARCH	Beta-GARCH with market-factor residual modeling, volatility clustering, BIC-selected lag structure	Volatility clustering — captures periods where high volatility persists and produces fatter tails than constant-volatility models
Historical	Non-parametric sliding window replay of actual price history	Captures historical price movements and crashes without distributional assumptions

Scoring from Simulation Output

The Extreme Event Resilience score decreases proportionally with Expected Shortfall from the lender's perspective — higher ES means greater potential capital impairment for the lender, not just borrower liquidation. A score of 1.00 means negligible lender loss. A score of 0.50 means the lender's tail-risk Expected Shortfall is 50% of the loan principal. This is distinct from Pr(Liquidation) — a pool can have high liquidation probability but near-zero lender loss if collateral ratios provide sufficient buffer.

Model Comparison: wstETH/USDC (Tests 7.1, 7.2, 7.3a)

Configuration

See Data Provenance — Model Comparison Input for full JSON payloads, Parameter Mapping for L/N to LTV conversion, and execution timestamps.

Parameter	GBM / GARCH	Historical
Collateral	wstETH	wstETH
Loan	USDC	USDC
LTV	66.7%	66.7%
Liquidation LTV	87.0%	87.0%
Loan duration	365 days	365 days
MC iterations	10,000	N/A (sliding window)
Volatility stress	`volatility_shock = 9.0` → scales vol by (1 + 9.0) = 10x	N/A
Lookback	365 days	730 days (~366 daily windows)

Simulation Outcomes

Metric	GBM (stressed)	GBM (no shock)	GARCH	Historical
Pr(Liquidation)	>99%	68.8%	93.9%	98.9%
Avg Lender Loss	<1%	<1%	<1%	<1%

Expected Shortfall by Quantile

Quantile	GBM (stressed)	GBM (no shock)	GARCH	Historical
LossES @99%	28.65%	<1%	<1%	<1%
LossES @99.9%	53.21%	<1%	<1%	<1%
LossES @99.99%	64.81%	<1%	<1%	<1%

Interpretation

GBM with volatility stress is the most conservative model by design. The stress multiplier scales the historical volatility estimates, producing simulated return distributions with significantly wider dispersion than historically observed. Under this stress, even well-collateralized positions face closeout, producing significant lender losses (64.8% ES at 99.99th percentile).

GBM without volatility stress produces near-zero lender loss — same as GARCH and Historical. This confirms that the volatility stress is the sole driver of extreme tail loss. Without it, GBM behaves comparably to the other models for this collateral configuration.

GARCH captures volatility clustering dynamics. The Beta-GARCH model produces fatter tails than constant-volatility GBM. Under these dynamics, 93.9% of simulations trigger liquidation events, but collateral ratios remain adequate and lender losses stay <1%.

Historical replays actual market conditions. The sliding window replay uses a 730-day lookback (2 years of daily price data), producing ~366 overlapping windows of 365-day loan simulations. 98.9% of windows triggered liquidation events. Worst-case CCR across all windows was 1.014 — lender losses remained <1%.

Stress Event Sensitivity: October 10th wstETH

To demonstrate the Historical model's sensitivity to specific stress events, we ran two simulations with identical parameters (wstETH/USDC, LTV=66.7%, Liquidation LTV=87.0%, 730-day lookback) but different analysis dates. For input payloads, see Data Provenance — Historical Stress Event.

Simulation	Analysis Date	Lookback Window	Pr(Liquidation)	Worst CCR	Lender ES
Includes Oct 10	2026-03-27	2024-03-28 → 2026-03-27	98.9%	1.014	<1%
Excludes Oct 10	2025-10-09	2023-10-10 → 2025-10-09	63.7%	1.014	<1%

Including the period around October 10th increases liquidation probability from 63.7% to 98.9%. The jump reflects not a single event but a broader period of poor market conditions — empirically verified through sustained price deterioration and elevated volatility across that window. The sliding windows that overlap with this stress sequence nearly all trigger liquidation events. The collateral ratios hold in both cases (worst CCR stays above 1.0), but the model clearly captures how prolonged market deterioration compounds liquidation risk. This confirms the Historical model responds to real market dynamics without distributional assumptions — its output changes when the data changes.

GBM without stress vs GARCH: GBM without volatility stress shows 68.8% liquidation probability, while GARCH shows 93.9%. GARCH captures volatility clustering that the constant-volatility GBM misses, producing more liquidation events under realistic dynamics. Both produce <1% lender loss at this LTV, so the difference surfaces in liquidation frequency rather than tail loss. The stressed GBM serves a different purpose — it models tail scenarios that lookback-calibrated models would miss because they haven't observed them yet (e.g., multi-sigma events like the 2022 LUNA/UST crash or March 2020 COVID drawdown).

Stress Period Sensitivity: GARCH vs GBM (Test 10.1)

8 simulation runs on wstETH/USDC comparing GBM (no shock) and GARCH with and without the October 10, 2025 crash in the lookback window. Two LTV configurations: production (66.7%) and stressed (80%). For input payloads, see Data Provenance — Stress Sensitivity.

LTV 66.7% (Production: N=1.5, L=1.15)

Model	Period	Pr(Liquidation)	Lender ES @99.99%	Worst CCR
GBM	Includes Oct 10	54.4%	<1%	0.999
GARCH	Includes Oct 10	93.1%	<1%	0.988
GBM	Excludes Oct 10	45.0%	<1%	1.014
GARCH	Excludes Oct 10	93.7%	<1%	0.964

LTV 80% (Stressed: N=1.25, L=1.1)

Model	Period	Pr(Liquidation)	Lender ES @99.99%	Worst CCR
GBM	Includes Oct 10	74.0%	<1%	0.961
GARCH	Includes Oct 10	96.5%	<1%	0.942
GBM	Excludes Oct 10	66.2%	<1%	0.945
GARCH	Excludes Oct 10	96.6%	<1%	0.930

GARCH produces 71% more liquidation events than GBM at production LTV (93.1% vs 54.4%) when the October crash is in the lookback. The October 10 stress event raises GBM liquidation probability by 9-12 percentage points, while GARCH is less sensitive to the specific event because it already captures volatility clustering from broader market dynamics. Lender ES remains <1% at both LTV configurations because the liquidation mechanism recovers capital — the collateral buffer absorbs the stress. The ES differential materializes only at LTV configurations where the collateral buffer is insufficient, which represents a protocol design failure rather than normal operations.

Multi-Pool GBM Results

Four pools scored with production GBM configuration (stressed, 10k MC, 365-day). For input payloads and per-pool parameters, see Data Provenance — Multi-Pool Input.

Pool	LTV	Liq LTV	ES @99%	ES @99.9%	ES @99.99%	Score
ETH/USDC	58.8%	76.9%	<1%	43.83%	62.87%	0.371
BTC/USDC	58.8%	76.9%	<1%	<1%	<1%	1.000
cbBTC/USDC	58.8%	76.9%	<1%	<1%	<1%	1.000
wstETH/USDC	66.7%	87.0%	34.22%	60.33%	69.62%	0.304

Observations:

BTC and cbBTC both show <1% lender loss at all quantiles. The 58.8% LTV and 76.9% Liquidation LTV provide sufficient buffer — borrowers get liquidated (96.4% and 96.0%), but lenders are fully protected.
ETH with the same conservative LTV (58.8%) shows moderate tail loss at the extreme quantiles.
wstETH with higher LTV (66.7%) and tighter Liquidation LTV (87.0%) shows the highest tail loss — less buffer means the stress can overwhelm the liquidation mechanism.

Native vs Wrapped Comparison

Same standardized parameters (LTV=58.8%, Liquidation LTV=76.9%, stressed) applied to all four assets for a clean comparison where only the asset differs. Uses the same input payloads as Multi-Pool with uniform L/N values.

Asset	ES @99%	ES @99.9%	ES @99.99%	Pr(Liquidation)
ETH	<1%	39.46%	59.83%	>99%
wstETH	<1%	43.28%	59.33%	>99%
BTC	<1%	<1%	<1%	96.4%
cbBTC	<1%	<1%	<1%	96.1%

Key finding: At identical LTV parameters, ETH and wstETH produce nearly identical tail loss (59.83% vs 59.33% at 99.99th). BTC and cbBTC both produce near-zero lender loss. The wrapped versions behave consistently with their native counterparts, confirming the risk engine correctly prices wrapped assets from their own historical price series.

Convergence Test (Test 2.1)

10 parallel GBM runs on ETH/USDC with identical parameters (stressed, LTV=58.8%, Liquidation LTV=76.9%) at each iteration count. For input payload and execution timestamps (10k, 100k, 1M), see Data Provenance — Convergence Input.

Convergence by Quantile — 10k MC iterations

Quantile	Mean ES	Std Dev	CV
99th	0.15	0.02	11.9%
99.9th	45.74	2.77	6.1%
99.99th	58.19	5.50	9.5%

Convergence by Quantile — 100k MC iterations

Same pool and parameters, 10x more simulation paths:

Quantile	Mean ES	Std Dev	CV
99th	0.15	0.01	5.4%
99.9th	45.93	1.49	3.2%
99.99th	61.02	2.34	3.8%

Convergence by Quantile — 1M MC iterations

Same pool and parameters, 100x more simulation paths than 10k.

Quantile	Mean ES	Std Dev	CV
99th	0.15	0.001	0.78%
99.9th	45.73	0.28	0.62%
99.99th	61.05	0.64	1.05%

Convergence Comparison Across Iteration Counts

Quantile	CV @ 10k	CV @ 100k	CV @ 1M
99th	11.9%	5.4%	0.78%
99.9th	6.1%	3.2%	0.62%
99.99th	9.5%	3.8%	1.05%

At 1M iterations, all three quantiles converge under 2% CV. The 99.99th percentile drops from 9.5% (10k) to 1.05% (1M).

Interpretation

The production system currently runs at 10k MC iterations, which achieves 6.1% CV at the 99.9th percentile. At the 99.99th percentile (the scoring design point), CV is 9.5% — above the 2% threshold.

If CV < 2% is a mandatory requirement, the system supports it: at 1M iterations, the 99.99th percentile converges to 1.05% CV. This is a compute cost tradeoff, not a model limitation — the system supports configurable iteration counts and quantiles.

Correlation Test (Test 7.3b)

Two GBM simulations with identical parameters (stressed, LTV=66.7%, Liquidation LTV=87.0%) except collateral composition. For input payloads (correlated and uncorrelated pairs), see Data Provenance — Correlation Test.

Configuration	Collateral	ES @99%	ES @99.9%	ES @99.99%
Correlated pair	wstETH (50%) + cbETH (50%)	31.88%	54.42%	71.18%
Uncorrelated pair	BTC (50%) + DAI (50%)	<1%	<1%	<1%

Correlated ES is significantly higher. The GBM model generates price paths using Cholesky decomposition of the historical covariance matrix. wstETH and cbETH (both ETH liquid staking derivatives) are highly correlated — when one drops, the other drops with it. The diversification benefit is minimal, and the correlated decline can overwhelm the liquidation mechanism.

BTC + DAI (a stablecoin) are effectively uncorrelated. DAI maintains peg while BTC moves, providing genuine diversification. Even under stress, the mixed collateral produces <1% lender loss.

Three-Model Architecture​

Scoring from Simulation Output​

Model Comparison: wstETH/USDC (Tests 7.1, 7.2, 7.3a)​

Configuration​

Simulation Outcomes​

Expected Shortfall by Quantile​

Interpretation​

Stress Event Sensitivity: October 10th wstETH​

Stress Period Sensitivity: GARCH vs GBM (Test 10.1)​

LTV 66.7% (Production: N=1.5, L=1.15)​

LTV 80% (Stressed: N=1.25, L=1.1)​

Multi-Pool GBM Results​

Native vs Wrapped Comparison​

Convergence Test (Test 2.1)​

Convergence by Quantile — 10k MC iterations​

Convergence by Quantile — 100k MC iterations​

Convergence by Quantile — 1M MC iterations​

Convergence Comparison Across Iteration Counts​

Interpretation​

Correlation Test (Test 7.3b)​

Three-Model Architecture

Scoring from Simulation Output

Model Comparison: wstETH/USDC (Tests 7.1, 7.2, 7.3a)

Configuration

Simulation Outcomes

Expected Shortfall by Quantile

Interpretation

Stress Event Sensitivity: October 10th wstETH

Stress Period Sensitivity: GARCH vs GBM (Test 10.1)

LTV 66.7% (Production: N=1.5, L=1.15)

LTV 80% (Stressed: N=1.25, L=1.1)

Multi-Pool GBM Results

Native vs Wrapped Comparison

Convergence Test (Test 2.1)

Convergence by Quantile — 10k MC iterations

Convergence by Quantile — 100k MC iterations

Convergence by Quantile — 1M MC iterations

Convergence Comparison Across Iteration Counts

Interpretation

Correlation Test (Test 7.3b)