The document discusses various technologies for building real-time collaborative applications, including techniques for distributed systems, shared editing, and transformation of concurrent operations. Some key topics covered are consistent distributed databases using protocols like Paxos, vector clocks for ordering events in distributed systems, and applying transformations to resolve conflicts from concurrent edits in shared documents. The document envisions that building collaborative applications of the future will present challenges but also opportunities to create new experiences.
* this talk is more abstract\n\n* let's try and think a year or two ahead, and figure out what those apps will be built with\n\n* not the specific apis or languages, but general design and architecture\n\n* some things will surely need to be invented, but some can be stolen from other fields\n\n* this is a small tour of things i believe will become very important\n
* we came from static\n\n* we got xhr and made things better\n\n* we figured out long polling to get to the first real-time apps\n\n* now we are just getting past the low hanging fruit\n\n* already seeing higher level frameworks like derby\n\n** most of these problems are not unique to the web.\n
- MMORPG or first person shooters\n\n- collaborative spaces\n\n- mashups of real-time apps\n
\n
how many single page app frameworks have we seen at this conference?\n\nsome of this stuff needs to go into browsers\n\nShared WebWorkers may be an initial answer\n
multimedia, filesharing, and later better state management\n\n\n
* halo network protocol\n
* silos\n\n* crawlers deal with only static content\n
\n
many of the common algorithms we use will fail to help us. \n\njust like big data, real-time apps will need new solutions.\n
* We'll start with something easy\n\n* How do you calculate the average of a set of numbers?\n
* pretty much everyone knows this\n\n* notice that you have to count the list, which means you have to have a list. this takes up a lot of space if the list is large. what if the list is infinite?\n\n
If we tally the sum and the count as they go by, then we can just divide at any time to get the average.\n\nThis still takes a bit of space, but not nearly as much O(log n) instead of O(n)\n
Store the last N values and use that for the average\n\ncan smooth this out by using exponentially weighted averages (think stock charts)\n
* let's get a little bit harder\n\n* how do you calculate the variance?\n
* note that you need the number of elements and each element minus the average\n
* after some algebra, you can get this\n\n* now we just need running totals for the mean and the mean of the squares, and the count\n\n* easy\n\n* note that you don't need to know any math for this, you just need to assume such things are possible, and go look them up.\n
* how do you test if something is in a set?\n
* very simple. a hash or dictionary can be used to make this a constant time operation\n\n* however, it consumes space proportional to the members. what if the list of members is quite large?\n\n
* bloom filters can solve this problem\n\n* probabilistic data structure. if it says not a member, 100% certainty. has a small chance to give you a false positive, which can be tuned\n\n* it takes constant space, and can never fill up\n\n* set union (and to some extent) set intersection are trivial (bitwise OR and AND)\n\n* used in databases and caches to prevent disk lookups for nonexistant stuff\n\n* several variants for different use cases (counting, scalable, etc)\n
* interesting applications tend to have a lot of state\n\n* many applications are collaborative, which means state is shared\n\n* state is often shared in non-collaborative apps since the user might have multiple tabs open\n\n* we are now in a really hard place\n\n* traditionally, we rely on databases to get us out of this mess, but this has severe limits for real-time apps\n\n* future apps will need to bring these kind of database features up to a higher layer, closer to the app\n\n* this is happening in general even on single machines with many cores. programming languages themselves are getting database semantics.\n
* there are tons of new databases around, many of which attempt to solve horizontal scalability issues\n\n* brewer's CAP theorem says we must trade availability or consistency for the partition tolerance we require\n\n* let's briefly discuss those two tradeoffs\n
* if we choose consistency, what does it look like?\n\n* all nodes must agree on state changes, how can we do this?\n\n* how do people do this? we vote.\n\n* in computer science, this is called consensus, and we determine consensus using a consensus protocol\n\n* we determine the master by consensus and send the update requests there\n\n* note that commit protocols don't work (ie, asking everyone to agree to some change, as opposed to a majority for example). they don't work in async environments and they fail harder.\n
* paxos is probably the most important consensus protocol\n\n* it works very well in the face of failure, and variants work even when nodes lie and cheat\n\n* unfortunately, if consensus cannot be reached, no progress can be made (this is the availability sacrifice)\n\n* overcoming failures costs latency, and latency is paramount. imagine what happens if some nodes are really slow\n\n\n
* let's trade off consistency instead\n\n* we can relax consistency to eventual consistency. this means current values will eventually make it to everyone, but at any given time the data may be inconsistent temporarily.\n\n* this model turns out to map very well to real-time web applications, and is what people are doing now in a simplified manner. we don't wait for absolute consistency, but we trust it will eventually converge.\n\n* how do we implement this?\n
* one way to deal with eventual consistency is to order the events by time, and then apply them in order.\n\n* unfortunately, synchronized clocks don't really exist, so we need an abstract notion of time instead\n\n* use actor id and a counter, and each time we see a message, increment our count and stamp the message.\n\n* we can detect conflicts easily. two vclocks conflict if neither is descended from the other. these conflicts must be resolved somehow.\n\n* whether this conflict resolution is manual, automatic, easy or difficult depends on the application.\n\n* using these, all participants deal with distributed state changes quickly and easily, until a conflict is detected, and many times, that conflict can be easily resolved (for example, picking randomly or set union).\n
* imagine you're at a party playing old video games\n\n* what games did everyone play?\n\n* we can use vector clocks manage the shared state of the list, shared by all the participants\n
* on the left is the actor and version, on the right is the value\n\n* jack is hanging out with some friends\n\n* jack says he played pacman\n\n
* julien says he played pinball\n\n* julien goes over to another group and asks the same question\n\n* jack goes off for one last game\n\n
* in this other group, bear says he played some racing games\n
meanwhile, in the original group, fritzy says he played star wars\n
* later julien reports what he learned from the other group\n\n* neither of the two values is a descendant of the other\n\n* note that this is easy to detect\n\n* in this case it's also easy to automatically resolve. just use the union of both sets\n
* jack resolves the conflict, updating his version and including seen clocks\n
* let's talk about a specific example, shared editing\n\n* two computer scientists, Ellis and Gibbs, figured out one way to do this in 1989 (22 years ago!), and it's called operational transformation. if you saw or used google wave, this is how it worked as well.\n\n* it builds on vector clocks to do automatic conflict resolution in a very nice way.\n
* both users start with a misspelled word\n\n* let's see what happens\n
\n
\n
\n
\n
\n
* henrik sees the right think, but adam does not\n\n* adam needs to transform henriks \n
* henrik sees the right think, but adam does not\n\n* adam needs to transform henriks \n
both parties can transform the incoming events based on their own actions.\n\nadam can transform henrik's edit by taking into account his own insert\n\nhenrik doesn't need to transform adam's edit since his delete didn't affect anything earlier in the string\n\nnow both can apply the transformed edits, and we have a consistent state\n
\n
* now adams and henrik get the others events\n
The transformed event from henrik must be moved over one to account for the change adam made.\n\nHenrik's changes didn't affect the positions before his change, so adam's event is unchanged\n
everything now looks great on both sides\n
* i implemented this as an example in my xmpp book in under 200 lines of javascript\n\n* you keep track of the vector clocks, priorities, and logs of editing requests and previously executed edits in order to make the transformations. you also need to define a set of operations like add/delete in the most trivial case.\n\n\n
* so far we've all seen and created amazing things using real-time techniques\n\n* the future looks very bright as we get more creative and ambitious\n
* unfortunately, we've accepted a task that is among one of the hardest\n\n* we must rethink search\n\n* we must find ways to more efficiently use the network, both in size and architecture (multiplexing, peer to peer)\n\n* we will need to overcome (as always) limitations of browsers\n\n* we must rethink algorithms we've always taken for granted\n\n* and most difficult, we must find practical ways to manage the shared state at the heart of our apps\n