Facebook Commenting goes Live!

 I know this is a bit late to make this post, Facebook had already launched this a couple of weeks back. But it always amazed me on how they did it, and recently Facebook made a post on the community Facebook Engineering regarding this. I was caught up with my academics to make an update regarding this, nevertheless - here I am :)
 So what is Live Commenting? If you are a regular user of Facebook, recently you must have noticed that comments are getting posted faster. But it does not get updated at the same time, previously it was like you had to click on 'Comment' and wait for some time for it to get updated(On slow connections). But now it gets updated instantly, but in fact it gets registered on the site only after a slight delay, often neglected. Another feature is that the comments get updated live, as in a chat. Now you can keep commenting as in a conversation. That's really fast for a Social Network website to do.

  So what's the technology behind it? As they revealed, a bunch of Engineers first experimented on Polling, but with such a massive user base, it would be impossible to keep up the pace. So now they have gone for the Push Technology. I would consider posting Facebook's notes rather than explaining the process myself. It goes like this.

Pushing vs. Polling Data
Initially we investigated a poll-based approach. For every page that had comment-able content, the page would periodically send a request to check whether new comments had arrived. By increasing the polling frequency, we could approximate a real-time feel. Unfortunately, simple experimentation led us to quickly conclude that this approach would not scale. Because humans are so sensitive to latency in real-time communications, creating a truly serendipitous commenting experience requires comments to arrive as quickly as humanly and electronically possible. In a poll-based approach this would mean a polling interval of less than five seconds (and that would still feel slow!), which would very easily overload our servers.
So we needed a push-based approach. To be able to push information about comments to viewers, we need to know who may be viewing the piece of content that each new comment pertains to. Because we serve 100 million pieces of content per minute, we needed a system that could keep track of this "who's looking at what" information, but also handle the incredible rate at which this information changed. 
Write Locally, Read Globally
Storing these one-to-one, viewer-to-content associations in a database is relatively easy. Keeping up with 16 million new associations per second is not. Up until this point, Facebook engineering had built up infrastructure optimized for many more reads than writes. But now we had flipped the game. Every page load now requires multiple writes (one for each piece of content being displayed). Each write of a comment requires a read (to figure out the recipients of an update). We realized that we were building something that was fundamentally backwards from most of our other systems.
At Facebook, traditionally, writes are applied to one database and asynchronously replicated to databases across all regions. This makes sense as the write rate is normally much lower than the read rate (users consume content much more than they produce).  A good way to think of this approach is "read locally, write globally".
Because of our unique situation, we settled on the completely opposite approach: "write locally, read globally." This meant deploying distributed storage tiers that only handled writes locally, then less frequently collecting information from across all of our data centers to produce the final result. For example, when a user loads his News Feed through a request to our data center in Virginia, the system writes to a storage tier in the same data center, recording the fact that the user is now viewing certain pieces of content so that we can push them new comments. When someone enters a comment, we fetch the viewership information from all of our data centers across the country, combine the information, then push the updates out. In practice, this means we have to perform multiple cross-country reads for every comment produced. But it works because our commenting rate is significantly lower than our viewing rate. Reading globally saves us from having to replicate a high volume of writes across data centers, saving expensive, long-distance bandwidth.
Source : facebook.com/engineering

No comments:

Post a Comment