Facebook is the “social networking “ People have been
“facebooking” each other for about 7 years now, making
Facebook the most used social network with over 500 million
users worldwide.50% of our active users log on to Facebook in
any given day Average user has 130 friends .People spend over
700 billion minutes per month on Facebook.There are over
900 million objects that people interact with (pages, groups,
events and community pages).
INTRODUCTION
Thrift is an interface definition language and binary
communication protocol
It is used as a remote procedure call (RPC) framework and was
developed at Facebook for "scalable cross-language services
development".
It combines a software stack with a code generation engine to
build services that work efficiently on
C#, C++ , Java, Perl, PHP, Python, Ruby and Smalltalk.
it is now an open source project in the ApacheSoftware
Foundation, now hosted onApache.
THRIFT
Scribe (log server) is aserver for aggregating log data
streamed in real-time from many other servers. Useful for
logging a wide array of data. It is built on top of Thrift.
Cassandra is adatabase management system designed to
handle large amounts of data spread out across many servers.
It powers Facebook’s Inbox Search feature and provides a
structured key-value store with eventual consistency.
HipHop for PHPis asource code transformer for PHPscript
code and was created to save server resources. HipHop
transforms PHPsource code into optimized C++.After doing
this, it uses g++ to compile it to machine code.
The BackEnd
The primary idea behind Thrift is that it consists of alanguage
neutral stack which is implemented across various programming
languages and an associated code generation engine which
transforms asimple interface and data definition language into
client and server remote procedure call libraries.
Thrift is designed to be assimple aspossible for the developers
who can define all the necessarydata structures and interfaces
for acomplex service in asingle short file.
This file is called asThrift Interface Definition Logic File or Thrift
IDLFile.
The developers identified some important features while
evaluating the technical challenges of cross language interactions
in anetworked environment.
Thrift DesignFeatures
Transport:
Eachlanguage must have acommon interface to bidirectional raw
data transport. Consider ascenario where there are 2servers in which, one
is deployed in Java and the other one is deployed in Python. Soatypical
service written in Java should be able to send the raw data from that service
to acommon interface which will be understood by the other server which
is running on Python and vice-versa. TheTransport Layer should be able to
transport the raw data file across the two ends.The specifics about how this
transport is implemented shouldn’t matter to the service developer. The
same application code should be able to run against TCPStream Sockets,
raw data in memory or files on disk.
Protocol:
In order to transport the raw data, they have to be encoded into a
particular format like binary, XMLetc. Therefore the Transport Layer uses
some particular protocol to encode or decode the data. Again the
application developer will not be bothered about this. He is only worried
whether the data can be read or written in some deterministic manner.
Types
Versioning:
For the services to be robust they must evolve from their
present version. They should incorporate new features and in
order to do this the data types involved in the service should
provide a mechanism to add or delete fields of an object or alter
the arguments list of afunction without any interruption in
service. This is calledVersioning.
Processors:
Processors are the ones which process the data streams
and accomplish Remote ProcedureCalls.
Cont..
Thrift has been employed in alarge number of applications at
Facebook, including search, logging, mobile, ads and the
developer platform. Two specific usages are discussed below.
Search
logging
Facebook ThriftServices
Facebook serves 570 billion page views per month
There are more photos on Facebook than all other photo sites
combined
More than 3 billion photos are uploaded every month.
Facebook’s systems serve 1.2 million photos per second.
More than 25 billion pieces of content (status updates,
comments, etc) are shared every month.
Facebook has more than 30,000 servers (and this number is
from last year!)
Facebook’s scaling challenge
Linux &Apache
PHP
Memcache
Haystack
BigPipe
How Does Facebook Work?
There are more than 20 billion uploaded photos on Facebook, and
each one is saved in four different resolutions, resulting in more
than 80 billion photos.
And it’s not just about being able to handle billions of photos,
performance is critical. Facebook serves around 1.2 million
photos per second.
Haystack is Facebook’s high-performance photo storage/retrieval
system, a highly scalable object store used to serve Facebook’s
immense amount of photos.
Strictly speaking, Haystack is an object store, so it doesn’t
necessarily have to store photos.
Haystack stores photo data inside 10 GB bucket with 1 MB of
metadata for every GB stored.
Haystack
Pipelining web pages for high performance
BigPipe -dynamic web page serving system, Facebook has
developed.
Facebook uses it to serve each web page in sections (called
“pagelets”) for optimal performance.
BigPipe is a fundamental redesign of the dynamic web page
serving system. The general idea is to pipeline pagelets through
several execution stages inside web servers and browsers.
BigPipe breaks the page generation process into several stages
The first three stages are executed by the web server, and the last
four stages are executed by the browser.
BIGPIPE
Free & open source, high-performance, distributed
memory object caching system
Memcached is an in-memory key-value store for small chunks
of arbitrary data (strings, objects) from results of database calls,
API calls, or page rendering.
The system uses a client–server architecture. the clients
populate this array and query it.
The servers keep the values in RAM; if a server runs out of
RAM, it discards the oldest values.
Clients can read each other's cached data.
MEMCACHE
Facebook has a system, Gatekeeper that lets run different
code for different sets of users.
This lets Facebook do gradual releases of new features,
activate certain features only for Facebook employees, etc.
Gatekeeper also lets Facebook do something called “dark
launches”, which is to activate elements of acertain
feature behind the scenes before it goes live
Gradual releases and darklaunches
The Facebook Platform provides a set of APIs and tools
which enable 3rd party developers to integrate with the
"open graph“.
Graph API is the core of Facebook Platform, enabling
developers to read and write data to Facebook
Facebook Platform
The GraphAPI presents asimple, consistent view of the
Facebook social graph, uniformly representing objects in the
graph (e.g.,people, photos, events, and pages) and the
connections between them (e.g., friend relationships,
shared content, and phototags).
RestfulAPI for accessing data on the Facebook graph.
Every object in the social graph has a unique ID. You can
access the properties of an object by requesting -
https://graph.facebook.com/ID
Alternatively, people and pages with usernames can be
accessed using their username asan ID.All responses are
JSONobjects.
The GraphAPI
FBMLis avariant-evolved subset of HTMLwith some elements
removed.
It allows FacebookApplication developers to customize the "look
and feel" of their applications, to alimited extent.
It is the specification of how to encode content so that
Facebook's servers can read and publish it.
FBMLplays an important role in building applications. FBMLis used
to tap in to various Facebook elements when building applications.
It operates alot like HTMLand it gives the ability to do various tasks
with ease suchas:
sending ausere-mail
embedding flashvideo
creating adashboard
posting on awall
Facebook Markup Language
Facebook also allows the use of regular HTMLtags, such as<a
href=”#”></a>, which is used to generate ahyperlink. Facebook alsoallows
the use of many more HTMLtags for building applications
FBML
The new Messages interweaves your chats, texts and emails.
It’s acentral place to control all of your private
communication, both on and off Facebook.
Simply put, it can be a single inbox for all of your messages,
no matter how you choose to send them.
A facebook.com EmailAddress
SMS FromFacebook
Chat History
Facebook’s New Messages
Facebook Connect is a set of APIs from Facebook that enable
Facebook members to log onto third-party websites,
applications, mobile devices and gaming systems with their
Facebook identity.
Facebook Connect
Unlike other social networks like Friendster, MySpace,
and Twitter – all of whom have run into serious scalability issues
at different points during their growth. Facebook has been mostly
reliable throughout its rise.
In actuality, Facebook uses JavaScript heavily, relies on their own
in-house PHP wrapper called XHP, HipHop (which optimizes
PHP), and many more technologies.
A lot of technologies have been developed by Facebookin-house
to serve their own needs, for example Cassandra
RELIABILITY
Thrift generates both the server and client interfaces for a given
service, and in a consistent manner. Client calls will be more
consistent
Related to above: Thrift's RPC-like behavior means that you get
type safety
Thrift supports various protocols, not just HTTP. If you are
dealing with large volumes of service calls, or have bandwidth
requirements, the client/server can transparently switch to more
efficient transports
Thrift is a mature piece of software; well tested and used.
Advantages of Thrift:
Thrift is poorly documented.
It is more work to get started on the client side, when the
clients are directly building the calling code. It's less work for
the service owner if they are building libraries for clientsYet
another dependency.
Disadvantages: