Chain RPC unavailable
What happens. Our upstream RPC (Alchemy / public node) drops requests or returns stale data. Affected endpoints:/position, /health, lifecycle event delivery (events queue up but emit late).
What you see.
/positionand/healthreturn503withcode: "chain_rpc_unavailable"- Lifecycle events arrive minutes-to-hours late (no event loss; we replay from the last confirmed block once RPC recovers)
- For
/position: cache the last successful read with a timestamp; show “live data temporarily unavailable, last updated 3 min ago” rather than spinning forever - For
/health: fall back to “treating as healthy” if your last successful response was within ~5 min, otherwise show maintenance banner - For lifecycle events: don’t show a “deposit failed” UI just because the event is late. Use the heuristic in Tx lost vs late below
- Retry strategy: exponential backoff starting at 30s, max 5 minutes between attempts. Don’t retry tighter — you’ll just amplify the upstream incident
Tx revert
What happens. The user’s deposit / redeem transaction reverts on-chain. Common causes: insufficient allowance, vault paused between simulation and submission, slippage on async vaults. What you see.- The wallet returns the tx hash, but the receipt has
status: 0 - No lifecycle webhook fires — we only emit on successful logs
- Read the receipt from your own RPC after submission; don’t wait for a webhook to learn the tx failed
- Show the revert reason if available:
revert reason from receipt → user-facing message map. Common ones:"ERC4626: insufficient allowance"→ “Please approve the token first”,"Pausable: paused"→ “This vault is temporarily unavailable” - Don’t mark a tx as failed based solely on webhook silence — see next section
Tx lost vs late
The problem. You submitted a tx, got a hash, but minutes later neither a webhook nor a receipt has arrived. Did it fail, get dropped, or is the indexer just slow? Heuristic.| Chain | Threshold |
|---|---|
| Ethereum mainnet | 6 min |
| Base | 2 min |
| Arbitrum | 30 sec |
| Optimism | 2 min |
| Polygon | 4 min |
Tx replacement (speed-up / cancel)
What happens. The user’s wallet submits a replacement tx with the same nonce but higher gas (speed-up) or zero value (cancel). Only one tx wins. What you see.- If the replacement is a successful deposit / redeem at higher gas: lifecycle webhook fires for the replacement tx hash, not the original
- If the replacement is a cancel (zero-value): no event fires; your “pending tx” entry has no resolution
- Don’t pin your pending state to one specific tx hash. Pin it to
(user, nonce, intent)and update when any tx with that nonce confirms - For cancel detection: if you see a tx with the same nonce confirm to a different
toaddress (typically the user’s own address), treat the original as cancelled - Most partners can ignore replacement entirely — the eventual webhook (if any) and the
/positionendpoint will reconcile
Chain reorg
What happens. The chain rolls back blocks below the latest tip. ADeposit log that briefly existed now doesn’t.
What you see. Nothing. We hold lifecycle events until they’re at confirmation depth (see Lifecycle → Confirmation depth) — past depth, reorgs are vanishingly rare. We do not deliver and then retract.
Handling. Trust the depth. If your business needs deeper certainty (e.g. > $1M deposits), use /position with a custom min_confirmations query param after the webhook arrives.
Vault pause mid-flight
What happens. Ops or upstream emergency triggers a pause while user deposits are in flight. What you see.- A
vault_pauseevent fires (operational webhook) - In-flight deposits either revert (most common) or complete normally — pause is forward-looking
- The
/api/partner/products/{slug}/healthendpoint flipsis_paused: true
- Hide the “Deposit” CTA the moment you receive
vault_pause - For pending deposits at pause time: poll the receipt; treat reverts as user-facing failures
- Withdrawals are typically allowed during pause; check
product.status—pausedallows redeem,deprecatedis redeem-only forever
Webhook delivery failure (your endpoint down)
What happens. Your endpoint returns5xx, times out, or drops the connection.
What we do. Retry up to 3 attempts at 30s / 120s / 600s after the initial try. Past that, the delivery is marked failed but the row is durably stored — you can manually replay from the Portal.
Handling.
- Always return
2xxquickly (under 10 sec) and process async. Long-running handlers cause us to time out and retry, which doubles your workload - Persist the dedupe key (
(event_type, chain_id, tx_hash, log_index)for lifecycle,(event_type, slug, date)for operational) before acking — if you ack and crash, the next retry will look like a duplicate and you’ll lose the event - Set up alerting on
faileddeliveries in Portal — the row is preserved but no one will tell you it’s there
Idempotency reminder
Every event family has a deterministic dedupe key. We may send the same delivery more than once due to network failures, our retries, or partner-initiated replay. Your handler must be idempotent.- Lifecycle events:
(event_type, chain_id, tx_hash, log_index)— uniquely identifies a chain log apy_change/tvl_alert:(event_type, slug, date)vault_pause:(event_type, slug)— re-emitted while paused
INSERT ... ON CONFLICT DO NOTHING as a successful no-op. Never branch on “have I seen this before” via a SELECT-then-INSERT pattern — race conditions will let duplicates through.
What’s next
- Webhooks — signature verification, retry policy details
- Deposit/Redeem Lifecycle — event sequence and confirmation depth per chain
- Error codes — full code catalog with HTTP status mapping