Lock-Free MPMC Queues And Hazard Pointers

June 1, 2026 • Interview Review

This note is my interview-ready explanation of a Michael-Scott-style multi-producer, multi-consumer queue in C++: where it linearizes, what compare-and-swap actually protects, and why memory reclamation is a separate problem.

One-minute project answer. The core is a Michael-Scott MPMC queue. Enqueue linearizes when a producer links a new node through `tail->next`. Dequeue linearizes when a consumer advances `head` to the next node. The hard part in C++ is not only moving pointers atomically; removed nodes cannot be deleted immediately because another thread may still hold a raw pointer. Hazard pointers solve that by letting threads publish the nodes they may dereference. Removed nodes are retired first and deleted only after a scan proves no hazard pointer protects them.

Queue Shape

The queue uses a dummy head node. With one real value, the shape is:

head_
  |
  v
[dummy/no value] -> [value A] <- tail_

On dequeue, the value comes from `head->next`, not from the old dummy head. A successful dequeue advances `head_` and retires the old dummy node.

out = std::move(*next->value);
next->value.reset();
head_.compare_exchange_strong(h, next, ...);
retire_or_delete(h);

Enqueue Linearization

Enqueue repeatedly snapshots `tail_` and `tail->next`. If `tail->next` is null, the observed tail appears to be the real last node. The producer tries to link its new node there:

if (next == nullptr) {
    if (t->next.compare_exchange_weak(next, n, ...)) {
        tail_.compare_exchange_strong(t, n, ...);
        return;
    }
}

The linearization point is the successful CAS on `t->next`. Advancing `tail_` afterward is helpful for performance but not what makes the new node reachable.

If `next != nullptr`, the observed tail is stale. The thread helps move `tail_` forward. This helping behavior is the lock-free bargain: losing one CAS often means another thread made progress.

Dequeue Linearization

Dequeue snapshots `head_`, protects it, reads `head->next`, and tries to advance `head_`:

if (next == nullptr) {
    return false;
}

if (head_.compare_exchange_strong(h, next, ...)) {
    out = std::move(*next->value);
    next->value.reset();
    retire_or_delete(h);
    return true;
}

The linearization point is the successful CAS on `head_`. After that point, this consumer owns the logical dequeue of that value.

Atomics And Memory Ordering

A mutex protects a region of code. An atomic protects operations on one object. In the queue, the shared state is `head_`, `tail_`, and each node's `next` pointer.

Ordering	Mental model
`relaxed`	Atomicity only, no ordering for surrounding memory.
`release`	Publish earlier writes.
`acquire`	Observe published state so later reads can rely on it.
`acq_rel`	Acquire plus release, common for read-modify-write operations.
`seq_cst`	Strongest global ordering, simpler but often more expensive.

For a queue node, the producer initializes the payload, then publishes the node pointer. A consumer that acquire-loads the pointer can safely read the initialized payload. Acquire/release is not a lock; it only synchronizes when the acquire observes the released value.

Why Hazard Pointers Exist

Lock-free pointer algorithms still have to decide when it is safe to delete removed nodes. A thread may load a pointer, get paused, and later dereference it. Meanwhile another thread might remove the node. Without a reclamation scheme, that is a use-after-free.

A hazard pointer is a public announcement:

I may still dereference this node. Do not delete it yet.

A typical hazard-domain table is an array of atomic pointers. Each participating thread owns one or more slot IDs, but the slot contents are globally visible so other threads can scan them before deleting retired nodes.

Validation Is Crucial

The pattern is not merely "load pointer, publish pointer." The thread must validate that the pointer was still current after publication became visible:

node* t = tail_.load(std::memory_order_acquire);
hazard_slot.store(t, std::memory_order_release);

if (t != tail_.load(std::memory_order_acquire)) {
    retry;
}

After validation, `t` may still become logically stale, but it cannot be deleted while the hazard slot contains it. That separates "safe to dereference" from "still the right algorithmic snapshot."

ABA And Reclamation

CAS checks whether a pointer value still equals the expected value. It can be fooled if a memory address goes through an `A -> B -> A` cycle while representing a different object. Hazard pointers help with the memory-reuse form of ABA by preventing a protected node from being deleted and reused while a thread might still dereference it.

Removed nodes go into a retired list. Periodically, the thread snapshots all hazard slots and deletes only retired nodes that do not appear in that snapshot:

retired_list = [A, C, X, D]
hazards      = [A, B, X]

delete C and D
keep A and X

When Lock-Free Is Worth It

Lock-free does not automatically mean faster. A simple mutex queue can win for tiny operations because the lock-free version pays for CAS retries, cache-line bouncing, per-node allocation, hazard publication, and scanning. The reason to build and study the lock-free queue is progress behavior under contention and the correctness problems that appear without a global lock.

Core intuition. Many threads repeatedly race to win one small CAS. The enqueue winner links `tail->next`; the dequeue winner advances `head_`. Losers retry or help advance stale pointers. If this thread loses, some other thread usually made progress.