14 Pages

RAFT : In Search of an Understandable Consensus Algorithm


Gain access to the library to view online
Learn more


InSearchofanUnderstandableConsensusAlgorithm DiegoOngaroandJohnOusterhout StanfordUniversity (DraftofApril7,2013,undersubmissiontoSOSP) Abstract was our most important criterion in evaluating de- Raftisaconsensusalgorithmformanagingareplicated sign alternatives. We applied specific techniques to improveunderstandability,includingdecompositionlog. It produces a result equivalent to Paxos, and it is (Raft separates leader election, log replication, andas efficient as Paxos, but its structure is different from safety so that they can be understood relatively in-Paxos; this makes Raft more understandable than Paxos dependently)andstatespacereduction(Raftreducesand also providesa better foundationfor building practi- the degree of nondeterminism and the ways serverscal systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader canbeinconsistentwitheachother,inordertomake electionandlogreplication,anditenforcesastrongerde- iteasiertoreasonaboutthesystem). greeofcoherencytoreducethenumberofstatesthatmust • Strongleader: Raftdiffersfromotherconsensusal- be considered. Raft also includes a new mechanism for gorithms in that it employs a strong form of leader- changingtheclustermembership,whichusesoverlapping ship where only leaders (or would-be leaders) issue majorities to guarantee safety. Results from a user study requests; other servers are completely passive.



Published by
Published 07 May 2013
Reads 714
Language English
In Search of an Understandable Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University (Draft of April 7, 2013, under submission to SOSP) Abstract was our most important criterion in evaluating de-Raft is a consensus algorithm for managing a replicated sign alternatives. We applied specific techniques to log. It produces a result equivalent to Paxos, and it is improve understandability, including decomposition as efficient as Paxos, but its structure is different from (Raft separates leader election, log replication, and Paxos; this makes Raft more understandable than Paxos safety so that they can be understood relatively in-and also provides a better foundation for building practi- dependently) and state space reduction (Raft reduces cal systems. In order to enhance understandability, Raft the degree of nondeterminism and the ways servers separates the key elements of consensus, such as leader can be inconsistent with each other, in order to make election and log replication, and it enforces a stronger de- it easier to reason about the system). gree of coherency to reduce the number of states that must Strong leader: Raft differs from other consensus al-be considered. Raft also includes a new mechanism for gorithms in that it employs a strong form of leader-changing the cluster membership, which uses overlapping ship where only leaders (or would-be leaders) issue majorities to guarantee safety. Results from a user study requests; other servers are completely passive. This demonstrate that Raft is easier for students to learn than makes Raft easier to understand and also simplifies Paxos. the implementation. 1 Introduction Membership changes: Raft’s mechanism for changing the set of servers in the cluster uses a sim-Consensus algorithms allow a collection of machines ities toworkasacoherentgroupthatcansurvivethefailuresopfletw jo o in d t if c f o er n e s n e t ns c u o s nagpuprraotaicohnswohveerrelatphedurmianjgortran-of some of its members. Because of this, they play a sitions. key role in building reliable large-scale software systems. Paxos[9,10]hasdominatedthediscussionofconsensusattWweopuenrifvoerrsmiteidatusteerststouudryhywpitohth4es3isgtrhaadtuRataeftsitsudmeonrtse algorithms over the last decade: most implementations es o of consensus are based on Paxos or influenced by it, and understandable than Paxos. After learning both algo-Paxoshasbecometheprimaryvehicleusedtoteachstu-r2i3th%mbs,etstteurdtehnatnsqwueersetiaobnlsetaoboanutswPaerxoqsu.estionsaboutRaft dents about consensus. Unfortunately, Paxos is quite difficult to understand, in We have implemented Raft in about 1500 lines of spite of numerous attempts to make it more approach- C++ code, and the implementation is used as part of able. Furthermore, its architecture is unsuitable for build- RAMCloud [18]. We have also proven the correctness ing practical systems, requiring complex changes to cre- of the Raft algorithm. ate an efficient and complete solution. As a result, both The remainder of the paper introduces the replicated system builders and students struggle with Paxos. state machine problem (Section 2), discusses the strengths After struggling with Paxos ourselves, we set out to and weaknesses of Paxos (Section 3), describes our gen-find a new consensus algorithm that could provide a bet- eral approach to understandability (Section 4), presents ter foundation for system building and education. Our ap- the Raft consensus algorithm (Sections 5-7), evaluates proach was unusual in that our primary goal was under-Raft (Section 8), and discusses related work (Section 9). standability : could we define a consensus algorithm and 2 Achieving fault-tolerance with replicated describe it in a way that is significantly easier to learn than machines Paxos, and that facilitates the development of intuitions state that are essential for system builders? It was important Consensus algorithms typically arise in the context of not just for the algorithm to work, but for it to be obvi-replicated state machines [20]. In this approach, state ma-ous why it works. In addition, the algorithm needed to be chines on a collection of servers compute identical copies complete enough to cover all the major issues required for of the same state and can continue operating even if some an implementation. of the servers are down. Replicated state machines are The result of our effort is a consensus algorithm called used to solve a variety of fault-tolerance problems in dis-Raft. Raft is similar in many ways to existing consensus tributed systems. For example, large-scale systems that algorithms (most notably, Oki and Liskov’s Viewstamped have a single cluster leader, such as GFS [4], HDFS [21], Replication [17]), but it has several novel aspects: and RAMCloud [18], typically use a separate replicated Design for understandability: understandability state machine to manage leader election and store config-