Infrastructure

    Building High-Performance Cardano Indexers

    12 min read
    Aggelos Kappos(Founder @ QBT Labs)
    CardanoIndexersDeFiPerformanceTypeScript
    Building High-Performance Cardano Indexers
    Share:

    What every team needs to consider when building a reliable Cardano indexer — from caching strategies to rollback handling.


    Why Indexers Matter

    If you're building anything on Cardano — a DeFi protocol, an analytics dashboard, a wallet, an explorer — you need an indexer. The blockchain is your source of truth, but it's not queryable. You can't ask the chain "show me all CDPs with a health factor below 1.2." You need infrastructure that reads every block, decodes the data, and stores it in a format your application can query efficiently.

    Building a toy indexer is easy. Building one that handles millions of blocks, survives chain rollbacks, processes multiple protocol versions, and stays in sync with the tip — that's where teams struggle.

    We've built and optimized indexers for major Cardano DeFi protocols. Here's what we've learned about what matters.


    The eUTxO Challenge

    Cardano's extended UTxO model is fundamentally different from account-based chains like Ethereum. There's no global state you can query — instead, the current state of a protocol is spread across unspent transaction outputs, each carrying a datum (structured data) and potentially locked by a validator script.

    This means your indexer has to do real work:

    • Find relevant UTxOs among potentially hundreds of outputs per block
    • Decode datums that are CBOR-encoded and follow protocol-specific schemas
    • Track spending — when a UTxO is consumed, your indexed state needs to reflect that
    • Handle multiple protocol versions — a DeFi protocol that upgraded its validators will have different datum formats across different eras

    Teams that treat Cardano indexing like Ethereum event indexing will hit a wall fast. There are no events to subscribe to. You must scan every transaction output and determine what's relevant to you.


    Caching: The Single Biggest Performance Lever

    The most impactful pattern we've found is per-block caching with lazy-built indexes. Here's the problem it solves.

    A typical DeFi protocol has many entity types — CDPs, liquidations, staking positions, governance votes, pool states. Each needs its own indexer. But they all read from the same block data. Without caching, every indexer independently scans all transaction outputs looking for its relevant tokens. With 15+ indexers, the same outputs get scanned dozens of times per block.

    The solution is a shared block context — a cache object created fresh for each block that multiple indexers read from. The key insight is making it lazy:

    • The first indexer that asks "which outputs contain token X?" triggers an O(n) scan that builds a lookup map
    • Every subsequent indexer that asks for any token gets an O(1) lookup from the already-built cache
    • If no indexer asks for mint data, the mint index is never constructed
    • Database records (like the block record or transaction records) are cached the same way — first indexer creates it, subsequent indexers get a cache hit

    This pattern eliminated the majority of redundant work in our indexers. The cache lives for exactly one block — typically under 50ms — then gets garbage collected. No stale data, no memory leaks, no invalidation complexity.

    What to consider for your team: Design your caching at the right granularity. A per-block cache is fundamentally different from a traditional web cache. It's short-lived but prevents enormous redundant work within its window. Think about what data multiple consumers need from the same block, and cache that.


    Parallel Processing: Do Less, Not More

    The intuitive approach to making indexers faster is parallelizing more work. Run all indexers concurrently. Use worker threads. Pipeline block processing.

    In practice, the bigger win is usually doing less work per block.

    We discovered that at any given point in the chain, only a subset of indexers are relevant. A protocol that launched at slot 50,000,000 doesn't need its indexers running on blocks from slot 1. A V1 indexer has no business processing V2 blocks. By letting each indexer declare its slot range, we eliminated 65% of indexer invocations — even though the no-op indexers returned immediately, the overhead of function calls, promise creation, and stats tracking for unnecessary indexers was significant at thousands of blocks per minute.

    Within the relevant indexers, parallel execution works well — but only if they share a block context cache (see above) and operate within the same database transaction. Running indexers truly independently means you lose transactional safety and risk partial block processing.

    What to consider for your team: Before parallelizing, profile what work is actually being done per block. You may find that most of your processing time goes to indexers that have nothing to do with the current block. Filtering comes before parallelizing.


    Datum and Redeemer Decoding

    This is where Cardano indexing gets genuinely hard. Datums are CBOR-encoded binary data attached to transaction outputs. Redeemers are the inputs provided to validator scripts during spending. Both carry the protocol-specific logic that your indexer needs to understand.

    The challenges:

    Schema evolution. When a protocol upgrades its smart contracts, datum formats change. Your indexer needs to know which decoder to use for which era. A CDP from V1 has different fields than a CDP from V2.1. Getting this wrong means silently storing garbage data.

    Inline vs. referenced datums. Post-Vasil, datums can be inline (attached directly to the output) or referenced by hash. Your indexer needs to handle both cases — and for referenced datums, you need access to the datum map in the transaction witness set.

    Partial decoding failures. Not every datum on-chain is yours. Your indexer will encounter datums from other protocols that don't match your schema. Robust decoding means failing gracefully — log the anomaly, skip the output, don't crash the entire block.

    Redeemer context. Redeemers tell you what action was performed — a deposit, a withdrawal, a liquidation. Without decoding redeemers, you know state changed but not why. For analytics and protocol health monitoring, the "why" is critical.

    What to consider for your team: Build a versioned decoder registry. Map protocol versions to slot ranges and datum schemas. Test your decoders against real on-chain data, not just test vectors. And always handle unknown datums gracefully — the chain has data from protocols you've never heard of.


    Database Optimization

    Your indexer's speed is often gated by database performance. A few patterns that matter:

    Wrap entire blocks in a single transaction. This sounds expensive, but it's your safety net. If any indexer fails mid-block, everything rolls back cleanly. Without this, you risk a state where CDPs were indexed but liquidations weren't — and your API serves inconsistent data.

    Use atomic upserts. The pattern of findOne followed by create is two round-trips when one will do. Use INSERT ... ON DUPLICATE KEY UPDATE (or your ORM's equivalent) to handle the "create if not exists" pattern atomically. When multiple indexers reference the same transaction, this can halve your database calls.

    Batch checkpoint writes. If your indexer saves a "last processed block" after every single block, that's a write you can batch. During historical catch-up, writing every 100th block is fine — if you crash and replay 100 blocks, it takes seconds. Switch to per-block writes only when near the chain tip.

    Right-size your connection pool. More connections ≠ faster. If your indexers run within a single database transaction per block, you only need enough connections for parallel indexer queries within that transaction. Oversized pools waste memory and can cause contention.

    What to consider for your team: Profile your actual query patterns, not your assumptions. We found that the biggest database wins came from not making queries at all (via caching), not from making individual queries faster.


    Rollback Handling: The Non-Negotiable

    Cardano's chain can reorganize. A block you processed might be reverted when a fork resolves. If your indexer can't undo its work, your database will contain data from blocks that no longer exist on the canonical chain.

    Every indexer must implement a rollback handler — and it must be tested explicitly. This means:

    • Soft deletes or versioned records so you can identify and remove data from rolled-back blocks
    • Block-level granularity — you need to be able to undo everything a specific block contributed
    • Cascading cleanup — if a block created a CDP and a subsequent block modified it, rolling back the first block must handle the dependency
    • Re-processing after rollback — once you've undone the rolled-back blocks, your indexer needs to process the new fork's blocks correctly

    The most dangerous bug in an indexer isn't a crash — it's a silent rollback mishandling that leaves phantom data in your database. An API serving data from non-existent blocks will cause real financial harm in DeFi.

    What to consider for your team: Make rollbacks a first-class citizen in your architecture, not an afterthought. Test them with real fork scenarios. An indexer that can't roll back is worse than no indexer at all.


    Failover and Resilience

    Production indexers need to handle failure gracefully:

    Node connectivity. Your connection to the Cardano node (typically via Ogmios or cardano-db-sync) will drop. WebSocket reconnection with exponential backoff is essential. After reconnecting, your indexer must determine where it left off and resume — not restart from genesis.

    Checkpoint recovery. When the indexer restarts, it reads its last checkpoint and requests blocks from that point forward. If you batched your checkpoints (see database optimization above), you'll replay a small window of already-processed blocks. Your indexers must be idempotent — processing the same block twice should produce the same result, not duplicate records.

    Monitoring. At minimum, track: blocks processed per minute, current block height vs chain tip, time per block, database query latency, and error rates. A drop in throughput often signals a problem before anything crashes.

    Multiple Ogmios instances. For production reliability, consider running multiple Ogmios connections to different Cardano nodes. If one node falls behind or drops, your indexer can failover to another without downtime.

    What to consider for your team: Design for restartability from day one. Your indexer will crash, your node will restart, your database will need maintenance. Every component should be able to resume cleanly from the last good state.


    Performance Benchmarks: What "Good" Looks Like

    From our experience across multiple production indexers:

    MetricHistorical SyncLive (at tip)
    Blocks/minute1,000 - 2,000+Real-time (1 block / 20s)
    Time per block30-60ms< 100ms
    Full sync time1-3 daysN/A
    Rollback recovery< 1 second< 1 second

    If your indexer is processing fewer than 100 blocks per minute during historical catch-up, something is fundamentally wrong — start profiling before adding hardware.


    Lessons Learned

    Profile before you optimize. Our intuition said the database was the bottleneck. It wasn't — the biggest win came from not calling functions at all, which no amount of query optimization would have revealed.

    The cost of "doing nothing" adds up. An indexer that checks its range and returns immediately costs nearly zero in isolation. Multiply by 28 indexers × thousands of blocks per minute, and you're spending significant time on functions that produce no output.

    Caching is a per-block concern. The right cache granularity depends on your access pattern, not conventional wisdom. A 50ms cache that prevents dozens of redundant scans is more valuable than a 5-minute cache that saves one database query.

    Rollbacks are not edge cases. On Cardano mainnet, short forks happen regularly. Treat rollback handling as core functionality, not error handling.

    Correctness first, speed second. A fast indexer that produces wrong data is worse than a slow one. Get the transactional guarantees right, make rollbacks work, verify datum decoding against on-chain data — then optimize.


    Related Reading


    At QBT Labs, we build and optimize blockchain indexers and infrastructure for DeFi protocols. If you're building on Cardano and need reliable, high-performance data pipelines — get in touch. We've been through the hard parts so you don't have to.

    Related Articles