Back to blog
Distributed Systems13 min read

HotStuff BFT Internals: How Chained Voting and Linear Communication Reshaped Consensus

HotStuff became the consensus algorithm of choice for Diem, Aptos, Sui, and dozens of production blockchains. Its linear communication complexity and chained pipelining are genuinely novel. Here's how the protocol actually works.

Consensus in Byzantine Fault Tolerant (BFT) systems is one of the oldest unsolved-but-practically-solved problems in distributed computing. PBFT (Practical Byzantine Fault Tolerance, Castro and Liskov, 1999) was the first practical BFT protocol, but its O(n^2) message complexity made it impractical beyond ~100 nodes. For 15 years, no protocol significantly improved on PBFT's fundamental message complexity.

HotStuff (Abraham et al., 2018, commercialized at Facebook for Diem) changed that. It achieves O(n) message complexity per consensus round — a linear communication protocol — while maintaining the same safety and liveness guarantees as PBFT. This asymptotic improvement translates to 10-100x higher throughput in validator networks with hundreds to thousands of nodes. Today, HotStuff or its direct derivatives power Aptos, Sui (Bullshark), Diem, and LibraBFT. Understanding it is essential for serious distributed systems engineers.

Byzantine Fault Tolerance: The Problem

A BFT consensus protocol must tolerate f Byzantine (arbitrarily faulty, potentially malicious) nodes among n total nodes. Byzantine nodes can send conflicting messages, delay messages, or behave arbitrarily. The fundamental result: BFT consensus requires n >= 3f + 1. With f = 1 Byzantine node, you need at least 4 total nodes. With f = 33, you need at least 100 total nodes.

The protocol must satisfy two properties simultaneously: Safety (no two honest nodes ever commit different values for the same slot) and Liveness (if f+1 honest nodes propose a value, it will eventually be committed). In the partially synchronous model (messages are eventually delivered within some unknown bound delta), both properties can be achieved — but only after the network becomes synchronous. HotStuff operates in this model.

PBFT and Why Its Communication Was O(n^2)

To understand HotStuff's contribution, it helps to understand what made PBFT expensive. PBFT uses three phases per consensus slot: PRE-PREPARE (leader proposes to all), PREPARE (all nodes broadcast PREPARE votes to all other nodes), and COMMIT (all nodes broadcast COMMIT votes to all other nodes). The PREPARE and COMMIT all-to-all broadcasts are O(n^2) messages per slot.

The reason for all-to-all is authentication: in PBFT, a node advances to the commit phase only after seeing n-f PREPARE messages from other nodes. To know that n-f others have received the PREPARE messages, each node must receive them directly. Forwarding through a leader would require trusting the leader to accurately report what others said — but the leader may be Byzantine.

HotStuff's Core Innovation: Threshold Signatures

HotStuff's linear communication is enabled by threshold signatures (specifically threshold BLS signatures). A (t, n)-threshold signature scheme allows any t of n participants to collaboratively produce a valid signature for a message. The combined signature is no larger than a single signature and is verifiable by anyone with the group's public key.

In HotStuff, instead of each node broadcasting its vote to all other nodes (O(n^2) total), each node sends its vote only to the leader (n messages). The leader collects n-f votes (n-f messages received) and combines them into a Quorum Certificate (QC) — a threshold signature proving that n-f nodes voted. Any node that receives the QC from the leader can verify it without needing to see the individual votes.

The safety argument: a valid QC proves that n-f nodes (a quorum) voted for a value. Since n >= 3f+1, any two quorums of size n-f must have at least one honest node in common (the quorum intersection property). This intersection ensures that two conflicting values cannot both get QCs — safety is preserved.

The Three-Phase Structure and Why Three Rounds

Why does HotStuff need three voting phases rather than two? This is the deepest question in the protocol's design, and the answer reveals a fundamental constraint in BFT consensus.

The core challenge is the 'locked value' problem: when a node votes COMMIT, it becomes locked on that value. If a view change happens (the leader fails), the new leader must know whether any node might have committed a value in the previous view. The three-phase structure creates a strict ordering: PREPARE proves n-f nodes saw the proposal; PRE-COMMIT proves n-f nodes know that n-f nodes saw the proposal; COMMIT proves n-f nodes know that n-f nodes know.

This chain of knowledge is necessary for safe view changes. A new leader can safely propose a new value only if no QC from the previous view could have led to a commit. With only two phases, there's a window where a node might have committed while others don't know. Three phases close this window: the COMMIT phase gives nodes certainty that the PRE-COMMIT quorum's knowledge is known to be known.

HotStuff's three-phase structure is not arbitrary overhead. It's the minimal structure that allows safe view changes while maintaining linear communication complexity. Earlier two-phase linear protocols either sacrificed safety in certain edge cases or required additional complexity elsewhere. The three-phase design is a tight bound.

Chained HotStuff: Pipelining Phases Across Slots

Basic HotStuff requires three message rounds and a view change round for each consensus slot — four network round trips in the common case. Chained HotStuff (also in the original paper) reduces this to effectively one network round trip per slot by pipelining phases across consecutive proposals.

The insight: the PREPARE phase of slot k+1 simultaneously serves as the PRE-COMMIT phase of slot k and the COMMIT phase of slot k-1. Each new proposal by the leader carries a QC from the previous slot, which advances the previous slot's phase. Three consecutive proposals and their QCs complete one slot's commit.

View Change: Handling Leader Failure

A key advantage of HotStuff is its elegant view change protocol. When a leader fails to make progress (timeout), nodes broadcast a NEW-VIEW message containing their current locked QC (the highest prepareQC they have seen). The new leader collects n-f NEW-VIEW messages and selects the highest locked QC as its new proposal base.

This is safe because: any value that could have been committed in the previous view must have a prepareQC that at least one honest node is locked on. Since the new leader collects n-f new-view messages and there are at most f Byzantine nodes, at least one of the n-f messages comes from an honest node that locked on the highest QC. The new leader must extend the highest locked QC, ensuring consistency with any potential previous commit.

View Change Latency

The HotStuff view change adds one additional round trip: nodes send NEW-VIEW to the new leader, the new leader aggregates and broadcasts, then normal operation resumes. Total latency after a leader failure: one timeout period + two round trips. This is significantly better than PBFT's complex view change protocol, which can require multiple rounds of all-to-all communication.

HotStuff Variants in Production

LibraBFT / DiemBFT

Facebook's Diem blockchain (formerly Libra) used LibraBFT — a direct implementation of Chained HotStuff with rotating leader election and pacemaker-based view change. LibraBFT added explicit timeout certificates (TCs): a quorum of timeout votes that prove a view change is warranted, enabling the new leader to know it can safely advance without waiting for individual timeouts.

Jolteon / DiemBFT v4

DiemBFT v4 (Gelashvili et al., 2021) introduced Jolteon — a two-phase variant that reduces the commit rule to two consecutive QCs rather than three. This is safe under an additional assumption about the behavior of slow nodes but reduces latency in the common case. Jolteon achieves commit in two round trips vs. three for basic HotStuff.

Bullshark and DAG-Based Consensus

Sui's consensus is based on Bullshark (Spiegelman et al., 2022), which operates over a DAG (Directed Acyclic Graph) of reliable broadcasts rather than a traditional chain. In Bullshark, all validators broadcast vertices asynchronously without a leader; consensus is achieved by interpreting specific structure in the DAG as implicit voting. This eliminates leader bottlenecks entirely — every validator contributes to throughput in proportion to its compute resources.

Practical Engineering Considerations

For engineers building on or integrating with HotStuff-based consensus systems, several engineering realities matter beyond the theoretical protocol.

HotStuff's real-world impact is the separation of concerns it enables: validator set size can grow into the hundreds or low thousands without communication complexity becoming the bottleneck. The bottleneck shifted to BLS signature aggregation and transaction dissemination — both of which have dedicated engineering solutions in modern systems.

What Comes After HotStuff

The frontier of BFT consensus research is moving in two directions: asynchronous safety (protocols that maintain safety even without any timing assumptions) and throughput-scaling through DAG parallelism. Shoal (Spiegelman et al., 2023) improves Bullshark's latency by committing anchors from all rounds in parallel rather than sequentially. Shoal++ (2024) further reduces latency to near-optimal theoretical bounds.

For production systems in 2026, the engineering choice is between: HotStuff-family leader-based protocols (predictable latency, well-understood failure modes, mature implementations in Aptos/Diem codebases) and DAG-based protocols like Bullshark/Shoal (higher throughput ceiling, better leader utilization, but newer and less battle-tested). The trend in new blockchain designs is clearly toward DAG-based consensus as the theoretical understanding matures.

Build Production Consensus Systems with Accelar

Accelar has deep expertise in distributed systems and blockchain consensus — from implementing BFT protocols to designing validator infrastructure and optimizing consensus performance for production deployments. Whether you're building a new L1, a permissioned blockchain, or integrating with an existing HotStuff-based network, our distributed systems engineers can help. Let's talk.