|
Invited students: |
Students interested: |
|
Description
Paxos is a system for ensuring consensus across a network of processors (e.g. multiple images on multiple computers or cpus on the same hardware). Fast Byzantine Paxos is Paxos extended to ensure reliability in the face of arbitrary failures among processors, lying from processors, collusion among more than once lying processor, etc. Fast Byzantine Paxos is just a version of Byzantine Paxos that omits a step seemingly trading network latency for additional CPU work. But I'm not sure of the tradeoffs.
My basic understanding is that among a set of images (technically "processors" but I'm going to use image because it seems more relevant here) you construct a quorum. That quorum calculates and agrees on a response to a client request. The quorum survives and is the unit of computation but the individual images inside it change in make up and number through time as images and machines die and come back. The quorum maintains and replicates the state among its member images. The images in the quorum can at any time assume at least one of 5 roles (client, acceptor, proposer, learner, leader) for that quorum and most play all but leader all the time. The images can participate in more than one quorum. There is only one 'leader' per quorum (but the leader can be any image in the quorum and is 'elected' by the quorum members) and sends messages to others in the quorum to process. I'm not sure of the gory details but the client sends a request to the quorum, the quorum processes it, and sends back the one agreed upon response regardless of how many machines are in the quorum that fail or fail to respond during the computation. Paxos describes all the intra process communication that needs to take place to ensure that the quorum provides reliable responses to requests in the face of hardware and network failures. Implementing parts of it in Slang would interesting but an in image version would be a good start.
Technical details
It is my understanding that implementing Paxos from scratch and covering all corner cases and failure modes is very very hard. Probably also confusing. Sorry I don't know more but here's a zip of the papers I've collected that describe ways and issues around implementing it:http://dl.dropbox.com/u/4460862/paxos-papers.zip.
Benefits to the Student
Learn Smalltalk, maybe some Slang. Implement something hard. Also build a robust distributed Smalltalk system.
Benefits to the Community
With enough images you can just persist everything in the image and each image can be a reasonable size. Other benefits include making a (hardware, network) failure resistant server Smalltalk process.
|