> ## Documentation Index
> Fetch the complete documentation index at: https://docs.barker.money/llms.txt
> Use this file to discover all available pages before exploring further.

# Failure Modes

> How to handle RPC outages, tx replacement, chain reorgs, vault pauses, and webhook delivery failures.

Production integrations fail in predictable ways. This page lists the failure modes we've observed across partners and the recommended handling for each.

## Chain RPC unavailable

**What happens.** Our upstream RPC (Alchemy / public node) drops requests or returns stale data. Affected endpoints: `/position`, `/health`, lifecycle event delivery (events queue up but emit late).

**What you see.**

* `/position` and `/health` return `503` with `code: "chain_rpc_unavailable"`
* Lifecycle events arrive minutes-to-hours late (no event loss; we replay from the last confirmed block once RPC recovers)

**Handling.**

* For `/position`: cache the last successful read with a timestamp; show "live data temporarily unavailable, last updated 3 min ago" rather than spinning forever
* For `/health`: fall back to "treating as healthy" if your last successful response was within \~5 min, otherwise show maintenance banner
* For lifecycle events: don't show a "deposit failed" UI just because the event is late. Use the heuristic in [Tx lost vs late](#tx-lost-vs-late) below
* Retry strategy: exponential backoff starting at 30s, max 5 minutes between attempts. Don't retry tighter — you'll just amplify the upstream incident

## Tx revert

**What happens.** The user's deposit / redeem transaction reverts on-chain. Common causes: insufficient allowance, vault paused between simulation and submission, slippage on async vaults.

**What you see.**

* The wallet returns the tx hash, but the receipt has `status: 0`
* **No lifecycle webhook fires** — we only emit on successful logs

**Handling.**

* Read the receipt from your own RPC after submission; don't wait for a webhook to learn the tx failed
* Show the revert reason if available: `revert reason from receipt → user-facing message map`. Common ones: `"ERC4626: insufficient allowance"` → "Please approve the token first", `"Pausable: paused"` → "This vault is temporarily unavailable"
* Don't mark a tx as failed based solely on webhook silence — see next section

## Tx lost vs late

**The problem.** You submitted a tx, got a hash, but minutes later neither a webhook nor a receipt has arrived. Did it fail, get dropped, or is the indexer just slow?

**Heuristic.**

```
elapsed_since_submit > confirmation_depth_blocks * 2 * block_time
```

Past that threshold, treat as lost. Concretely:

| Chain            | Threshold |
| ---------------- | --------- |
| Ethereum mainnet | 6 min     |
| Base             | 2 min     |
| Arbitrum         | 30 sec    |
| Optimism         | 2 min     |
| Polygon          | 4 min     |

Before the threshold, show "Confirming…" and keep polling the receipt. After the threshold without a receipt, show "Transaction may have been dropped — please check your wallet" and offer a re-submit. Do **not** auto-retry — the user might have re-broadcast at higher gas already.

## Tx replacement (speed-up / cancel)

**What happens.** The user's wallet submits a replacement tx with the same nonce but higher gas (speed-up) or zero value (cancel). Only one tx wins.

**What you see.**

* If the replacement is a successful deposit / redeem at higher gas: lifecycle webhook fires for the **replacement** tx hash, not the original
* If the replacement is a cancel (zero-value): no event fires; your "pending tx" entry has no resolution

**Handling.**

* Don't pin your pending state to one specific tx hash. Pin it to `(user, nonce, intent)` and update when **any** tx with that nonce confirms
* For cancel detection: if you see a tx with the same nonce confirm to a different `to` address (typically the user's own address), treat the original as cancelled
* Most partners can ignore replacement entirely — the eventual webhook (if any) and the `/position` endpoint will reconcile

## Chain reorg

**What happens.** The chain rolls back blocks below the latest tip. A `Deposit` log that briefly existed now doesn't.

**What you see.** Nothing. We hold lifecycle events until they're at confirmation depth (see [Lifecycle → Confirmation depth](/deposit-redeem-lifecycle#confirmation-depth)) — past depth, reorgs are vanishingly rare. We do not deliver and then retract.

**Handling.** Trust the depth. If your business needs deeper certainty (e.g. > \$1M deposits), use `/position` with a custom `min_confirmations` query param after the webhook arrives.

## Vault pause mid-flight

**What happens.** Ops or upstream emergency triggers a pause while user deposits are in flight.

**What you see.**

* A `vault_pause` event fires (operational webhook)
* In-flight deposits either revert (most common) or complete normally — pause is forward-looking
* The `/api/partner/products/{slug}/health` endpoint flips `is_paused: true`

**Handling.**

* Hide the "Deposit" CTA the moment you receive `vault_pause`
* For pending deposits at pause time: poll the receipt; treat reverts as user-facing failures
* Withdrawals are typically allowed during pause; check `product.status` — `paused` allows redeem, `deprecated` is redeem-only forever

## Webhook delivery failure (your endpoint down)

**What happens.** Your endpoint returns `5xx`, times out, or drops the connection.

**What we do.** Retry up to 3 attempts at `30s / 120s / 600s` after the initial try. Past that, the delivery is marked `failed` but the row is durably stored — you can manually replay from the [Portal](https://portal.barker.money/webhooks).

**Handling.**

* **Always return `2xx` quickly** (under 10 sec) and process async. Long-running handlers cause us to time out and retry, which doubles your workload
* Persist the dedupe key (`(event_type, chain_id, tx_hash, log_index)` for lifecycle, `(event_type, slug, date)` for operational) **before** acking — if you ack and crash, the next retry will look like a duplicate and you'll lose the event
* Set up alerting on `failed` deliveries in Portal — the row is preserved but no one will tell you it's there

## Idempotency reminder

Every event family has a deterministic dedupe key. We may send the same delivery more than once due to network failures, our retries, or partner-initiated replay. Your handler **must** be idempotent.

* Lifecycle events: `(event_type, chain_id, tx_hash, log_index)` — uniquely identifies a chain log
* `apy_change` / `tvl_alert`: `(event_type, slug, date)`
* `vault_pause`: `(event_type, slug)` — re-emitted while paused

If your DB is your dedupe store, use a unique index on these tuples and treat `INSERT ... ON CONFLICT DO NOTHING` as a successful no-op. Never branch on "have I seen this before" via a SELECT-then-INSERT pattern — race conditions will let duplicates through.

## What's next

* [Webhooks](/webhooks) — signature verification, retry policy details
* [Deposit/Redeem Lifecycle](/deposit-redeem-lifecycle) — event sequence and confirmation depth per chain
* [Error codes](/error-codes) — full code catalog with HTTP status mapping
