Market Making

    Inside Our Market Making Stack: From Signal to Execution

    10 min read
    Aggelos Kappos(Founder @ QBT Labs)
    Market MakingOpenMMTradingArchitectureRisk Management
    Inside Our Market Making Stack: From Signal to Execution
    Share:

    Most market making systems are black boxes. You see the bid and ask on the order book — you don't see what happens in the 50 milliseconds between a price signal arriving and an order landing on the exchange. This post opens that black box for the OpenMM stack.

    This is not a theoretical overview. This is exactly how we process a signal, build an order, manage risk, and execute — from the first tick to the filled trade.


    The Five Layers

    Every market making operation, from the most sophisticated quant fund to an individual running a grid bot, needs to solve the same five problems:

    1. Signal — What does the market look like right now, and where is it going?
    2. Strategy — Where do I place my bids and asks given that signal?
    3. Risk — How do I protect myself from adverse inventory and fat-finger errors?
    4. Execution — How do I get my orders to the exchange as fast as possible?
    5. Settlement — How do I account for P&L and manage positions across venues?

    OpenMM addresses all five. Here's how.


    Layer 1 — Signal

    A market making strategy is only as good as its view of the market. Our signal layer aggregates three types of data:

    Order book data Real-time L2 order book from each exchange. We track best bid, best ask, mid-price, and book depth at multiple price levels. Spread compression at the top of the book is often an early indicator of incoming volume.

    Trade flow Recent trade history tells you who is aggressing. A sequence of large market buys signals short-term upward pressure. This feeds directly into the inventory skew calculation — if we think price is going up, we want to hold more base asset, so we shade our bids higher and our asks higher too.

    Cross-venue price discovery For pairs trading on multiple exchanges simultaneously, we maintain a reference price derived from a weighted average of mid-prices across venues. This is what we use as the "true" mid when placing orders — not the single-exchange mid, which can diverge temporarily due to venue-specific flow.

    What we don't do: We don't use prediction models or ML signals in the core strategy loop. Market making is not about predicting price direction — it's about capturing spread while keeping inventory neutral. Prediction models introduce their own risk and complexity. The signal layer stays simple and fast.


    Layer 2 — Strategy

    With a signal in hand, the strategy layer answers: where do I quote?

    Grid strategy (default) The simplest viable approach. Place N orders above and below mid-price, evenly spaced by a configurable spread increment. When an order fills, immediately replace it. Profits come from the spread captured on each round-trip.

    Parameters:

    • spread — distance between best bid/ask and mid (e.g. 0.1%)
    • order_amount — size per grid level
    • num_orders — levels on each side
    • rebalance_threshold — when inventory skew triggers rebalancing

    Inventory management This is where most grid bots fail. Pure grid strategies ignore inventory. If price trends upward while your bot is running, you accumulate base asset (your bids keep filling but your asks don't), and you end up holding a large long position at a loss.

    OpenMM's inventory management works like this:

    1. Track current base/quote ratio vs target (e.g. 50/50)
    2. If base > target: shade asks slightly lower (more aggressive selling), shade bids slightly higher (less aggressive buying)
    3. If quote > target: reverse
    4. The shading magnitude scales with how far from target you are

    This keeps the portfolio close to neutral without requiring active position hedging.

    Spread adaptation When volatility increases, widen the spread. When book depth drops, widen the spread. When you're far from target inventory, widen the spread. The underlying principle: in uncertain conditions, make less but be more selective. A tight spread in a volatile market is a guaranteed way to get picked off.


    Layer 3 — Risk

    Risk is the layer that keeps you alive. Every market maker eventually faces an adverse event — a flash crash, an exchange outage, a sudden manipulation spike. Risk management is what determines whether that event is a bad day or a fatal one.

    Per-order limits

    • Max order size: hard cap on any single order, regardless of strategy parameters
    • Max notional exposure: total open orders x price cannot exceed configured limit
    • Slippage protection: if fill price deviates more than X% from expected, cancel and re-evaluate

    Position limits

    • Max inventory in base asset (long or short)
    • If position exceeds limit: cancel all orders, wait for rebalance before resuming
    • This is the circuit breaker — it fires automatically, no human needed

    Drawdown protection

    • Track cumulative P&L since session start
    • If drawdown exceeds threshold (e.g. -2%): halt all trading, alert operator
    • Resume only after manual confirmation or configured cooldown

    Exchange-specific safeguards

    • Order rate limiting: never exceed exchange's API rate limit (prevents bans)
    • Duplicate order detection: never place the same order twice (common source of loss)
    • Stale order cleanup: cancel orders that haven't filled within configurable timeout

    The most important risk rule: When in doubt, do nothing. It is always better to miss a trade than to make a bad one. The risk layer is biased toward inaction, not action.


    Layer 4 — Execution

    Execution is where milliseconds matter. The difference between a 10ms and 100ms order placement round-trip can be the difference between a fill at the intended price and a fill at a worse price, or no fill at all.

    Connection management OpenMM maintains persistent WebSocket connections to each exchange rather than opening new connections per request. This eliminates the TCP handshake overhead (typically 20-50ms) on every order.

    Order batching When multiple orders need to be placed simultaneously (e.g. after a rebalance), they are submitted as a batch where the exchange supports it. MEXC, Gate.io, and others support batch order placement — a single API call for multiple orders rather than N calls for N orders.

    Optimistic placement Orders are placed optimistically — we don't wait for confirmation before updating internal state. If an order is rejected, we handle the exception and reconcile. This improves throughput significantly at the cost of more complex state management.

    Order lifecycle tracking Every order has a lifecycle: pending, open, partially filled, filled, or cancelled. We track every state transition. This is how we know when to replace, when to cancel, and when something unexpected happened.

    Exchange connectors OpenMM uses a unified connector interface. Each exchange (MEXC, Gate.io, Bitget, Kraken) implements the same interface: placeOrder, cancelOrder, getOrderBook, getBalance. Strategies don't care which exchange they're running on — the connector abstracts that away.


    Layer 5 — Settlement

    After the trade comes the accounting.

    P&L tracking Every fill updates the running P&L calculation. We track:

    • Realized P&L: from completed round-trips (bid fill + ask fill)
    • Unrealized P&L: from open positions at current mid-price
    • Fees: exchange fees, gas fees (for on-chain venues), any other costs

    Position reconciliation Every N seconds, we reconcile our internal position tracking against the exchange's reported balances. Discrepancies trigger an alert. If the discrepancy exceeds a threshold, we halt and wait for manual review.

    Session reporting At the end of each session (or on-demand), OpenMM generates a session report: total volume, realized P&L, average spread captured, fill rate, inventory delta. This is what you use to tune strategy parameters.


    AI Integration — What It Does and Doesn't Change

    OpenMM integrates with Claude and other LLMs via the MCP (Model Context Protocol). This lets an AI agent issue commands like "start a grid on BTC/USDC with 1% spread" or "what's my current P&L?" — and have those commands execute in real-time on the exchange.

    What AI adds:

    • Natural language strategy configuration (no need to know parameter names)
    • Real-time market commentary ("the spread on this pair has compressed 40% in the last hour")
    • Portfolio-level questions across multiple bots and venues

    What AI doesn't replace:

    • The risk layer (always runs independently of AI commands)
    • Execution speed (AI commands go through the same pathway as manual commands)
    • Signal quality (AI doesn't generate better signals; it helps interpret existing ones)

    The AI is a control plane, not a strategy engine. The strategy engine runs deterministically.


    Open Source

    The full OpenMM stack is open source under MIT license:

    We believe infrastructure this fundamental should be open. The edge in market making comes from signal quality, execution speed, and operational excellence — not from hiding the strategy framework.


    Related Reading


    Questions? Reach out at [email protected] or find us on X @QBTLabs.

    Related Articles