Data Integrity Evidence
Multi-layer defense against stale, missing, or imprecise data in production. Supports Tests 9.1 and 9.2. For term definitions, see the Glossary. For data sources, see Data Provenance.
Three-Layer Defense
The platform prevents stale data from producing unqualified risk scores through three layers that operate at different points in the data pipeline (Data Provenance).
| Layer | Mechanism | Detection Point | Outcome |
|---|---|---|---|
| 1 — Cross-source validation | Invariant checks cross-verify data across multiple sources | Per-snapshot, during ETL | Divergence persisted to data warehouse; pipeline halts if threshold exceeded |
| 2 — Pipeline freshness monitoring | Tracks table age against current timestamp | Scheduled monitoring | Table flagged STALE with exact age in minutes |
| 3 — Oracle architectural scoring | Deductions for contracts lacking staleness validation | Oracle assessment | Score capped below Reference-grade |
Layer 1 — Cross-Source Validation
Every pool snapshot runs invariant checks that cross-verify data from multiple sources. Results are persisted as structured JSON in each snapshot record — not transient logs, but first-class auditable fields.
The system detects three categories of data integrity failures:
- Missing or unavailable prices — when a primary price source returns zero, null, or stale values, the system detects the failure, switches to an alternative source (e.g., CoinGecko), and records the fallback in the snapshot. The fallback is auditable.
- Position-level divergence — position totals are cross-checked against aggregates reported by the protocol. Divergence beyond configured thresholds halts the pipeline.
- Market-level divergence — supply, borrow, and total USD values are cross-verified across data sources. Discrepancies are recorded and, if material, prevent score generation.
When any divergence exceeds its configured threshold, the pipeline fails and does not produce a score. The system will not generate a risk assessment from data that violates its invariants.
Layer 2 — Pipeline Freshness Monitoring
A monitoring service compares the latest data timestamp against the current time for every materialized table.
| Status | Condition | Effect |
|---|---|---|
| SYNCHRONIZED | Data age within threshold | Normal operation |
| STALE | Data age exceeds threshold | Flagged with exact age in minutes |
| MISSING_DATA | Missing rows exceed percentage threshold | Flagged for investigation |
Layer 3 — Oracle Architectural Scoring
Oracle scores include deductions for contracts that lack staleness validation in their on-chain implementation.
| Deduction | Penalty | What It Means |
|---|---|---|
| No staleness check | -0.05 | Oracle adapter does not verify that the price data is recent |
| Staleness passthrough | -0.10 | Stale prices from underlying feed pass through unchecked |
An oracle that cannot detect its own staleness is structurally penalized, contributing to a score below Reference-grade (0.9-1.0).
Production Track Record
The pipeline has been running continuously in production since January 2026 without a single unrecoverable failure. Throughout this period, the cross-source validation layer has been active on every snapshot — detecting, recording, and where necessary halting execution when data quality invariants were violated. Stale or missing price events were caught and handled without manual intervention.
How the Layers Interact
Layer 3 identifies oracles structurally vulnerable to staleness. Layer 1 detects when stale or missing data actually enters the pipeline. Layer 2 detects when data stops arriving entirely. Together, they ensure that stale data cannot produce an unqualified risk score.
Precision & Schema Normalization (Test 9.2)
Raw on-chain values are transformed into a normalized schema using precision-preserving types at every stage of the pipeline (Data Provenance).
Type Mapping
| On-Chain Type | Domain Layer | BigQuery Type | Rationale |
|---|---|---|---|
| uint256 (shares, raw amounts) | Decimal | STRING | Lossless storage — no truncation at any token decimal count |
| uint128 (rates, ratios) | Decimal | BIGNUMERIC | 38-digit precision — exceeds uint128 range (39 digits) |
| Computed USD values | Decimal | BIGNUMERIC | Preserves sub-cent precision for institutional metrics |
| Percentages (APY, utilization) | Decimal | BIGNUMERIC | No floating-point rounding in rate calculations |
The domain layer uses Python Decimal throughout — no floating-point arithmetic touches financial values between RPC ingestion and BigQuery persistence.
Cross-Verification Against Protocol Sources
The same cross-source validation layer described in Test 9.1 serves as the precision audit mechanism. Every snapshot cross-verifies computed values against protocol-reported aggregates:
- Position totals are summed from individual on-chain positions and compared against the protocol's own aggregate endpoints. Divergence is persisted with the exact ratio and absolute USD amount.
- USD valuations are independently computed from raw token amounts and external price feeds, then cross-checked against protocol-reported USD totals.
When the computed value diverges from the protocol's reported value beyond the configured threshold, the pipeline halts. This guarantees that no transformation step silently loses precision — if it did, the divergence check would catch it.
Schema Evidence
Production schema from lending_pool_state_mt confirms the type mapping is enforced at the storage layer:
| Column | Type | Purpose |
|---|---|---|
| total_shares | STRING | uint256 share totals — lossless |
| shares (position tables) | STRING | uint256 individual positions — lossless |
| utilization_rate | BIGNUMERIC | Computed rate — 38-digit precision |
| supply_apy | BIGNUMERIC | Protocol rate — 38-digit precision |
| collateral_ratio | BIGNUMERIC | Computed ratio — 38-digit precision |
| position_value_usd | BIGNUMERIC | USD valuation — sub-cent precision |