2. Facebook is the “social networking “ People have been
“facebooking” each other for about 7 years now, making
Facebook the most used social network with over 500 million
users worldwide.50% of our active users log on to Facebook in
any given day Average user has 130 friends .People spend over
700 billion minutes per month on Facebook.There are over
900 million objects that people interact with (pages, groups,
events and community pages).
INTRODUCTION
3. Thrift is an interface definition language and binary
communication protocol
It is used as a remote procedure call (RPC) framework and was
developed at Facebook for "scalable cross-language services
development".
It combines a software stack with a code generation engine to
build services that work efficiently on
C#, C++ , Java, Perl, PHP, Python, Ruby and Smalltalk.
it is now an open source project in the Apache Software
Foundation, now hosted on Apache.
THRIFT
4. Scribe (log server) is a server for aggregating log data
streamed in real-time from many other servers. Useful for
logging a wide array of data. It is built on top of Thrift.
Cassandra is a database management system designed to
handle large amounts of data spread out across many servers.
It powers Facebook’s Inbox Search feature and provides a
structured key-value store with eventual consistency.
HipHop for PHP is a source code transformer for PHP script
code and was created to save server resources. HipHop
transforms PHP source code into optimized C++. After doing
this, it uses g++ to compile it to machine code.
The Back End
5.
6. The primary idea behind Thrift is that it consists of a language
neutral stack which is implemented across various programming
languages and an associated code generation engine which
transforms a simple interface and data definition language into
client and server remote procedure call libraries.
Thrift is designed to be as simple as possible for the developers
who can define all the necessary data structures and interfaces
for a complex service in a single short file.
This file is called as Thrift Interface Definition Logic File or Thrift
IDL File.
The developers identified some important features while
evaluating the technical challenges of cross language interactions
in a networked environment.
Thrift Design Features
7.
8. Transport:
Each language must have a common interface to bidirectional raw
data transport. Consider a scenario where there are 2 servers in which, one
is deployed in Java and the other one is deployed in Python. So a typical
service written in Java should be able to send the raw data from that service
to a common interface which will be understood by the other server which
is running on Python and vice-versa. The Transport Layer should be able to
transport the raw data file across the two ends. The specifics about how this
transport is implemented shouldn’t matter to the service developer. The
same application code should be able to run against TCP Stream Sockets,
raw data in memory or files on disk.
Protocol:
In order to transport the raw data, they have to be encoded into a
particular format like binary, XML etc. Therefore the Transport Layer uses
some particular protocol to encode or decode the data. Again the
application developer will not be bothered about this. He is only worried
whether the data can be read or written in some deterministic manner.
Types
9. Versioning:
For the services to be robust they must evolve from their
present version. They should incorporate new features and in
order to do this the data types involved in the service should
provide a mechanism to add or delete fields of an object or alter
the arguments list of a function without any interruption in
service. This is called Versioning.
Processors:
Processors are the ones which process the data streams
and accomplish Remote Procedure Calls.
Cont..
10. Thrift has been employed in a large number of applications at
Facebook, including search, logging, mobile, ads and the
developer platform. Two specific usages are discussed below.
Search
logging
Facebook Thrift Services
11. Facebook serves 570 billion page views per month
There are more photos on Facebook than all other photo sites
combined
More than 3 billion photos are uploaded every month.
Facebook’s systems serve 1.2 million photos per second.
More than 25 billion pieces of content (status updates,
comments, etc) are shared every month.
Facebook has more than 30,000 servers (and this number is
from last year!)
Facebook’s scaling challenge
12. Linux & Apache
PHP
Memcache
Haystack
BigPipe
How Does Facebook Work?
13. There are more than 20 billion uploaded photos on Facebook, and
each one is saved in four different resolutions, resulting in more
than 80 billion photos.
And it’s not just about being able to handle billions of photos,
performance is critical. Facebook serves around 1.2 million
photos per second.
Haystack is Facebook’s high-performance photo storage/retrieval
system, a highly scalable object store used to serve Facebook’s
immense amount of photos.
Strictly speaking, Haystack is an object store, so it doesn’t
necessarily have to store photos.
Haystack stores photo data inside 10 GB bucket with 1 MB of
metadata for every GB stored.
Haystack
14. Pipelining web pages for high performance
BigPipe -dynamic web page serving system, Facebook has
developed.
Facebook uses it to serve each web page in sections (called
“pagelets”) for optimal performance.
BigPipe is a fundamental redesign of the dynamic web page
serving system. The general idea is to pipeline pagelets through
several execution stages inside web servers and browsers.
BigPipe breaks the page generation process into several stages
The first three stages are executed by the web server, and the last
four stages are executed by the browser.
BIGPIPE
15.
16. Free & open source, high-performance, distributed
memory object caching system
Memcached is an in-memory key-value store for small chunks
of arbitrary data (strings, objects) from results of database calls,
API calls, or page rendering.
The system uses a client–server architecture. the clients
populate this array and query it.
The servers keep the values in RAM; if a server runs out of
RAM, it discards the oldest values.
Clients can read each other's cached data.
MEMCACHE
17. Facebook has a system, Gatekeeper that lets run different
code for different sets of users.
This lets Facebook do gradual releases of new features,
activate certain features only for Facebook employees, etc.
Gatekeeper also lets Facebook do something called “dark
launches”, which is to activate elements of a certain
feature behind the scenes before it goes live
Gradual releases and dark launches
18. The Facebook Platform provides a set of APIs and tools
which enable 3rd party developers to integrate with the
"open graph“.
Graph API is the core of Facebook Platform, enabling
developers to read and write data to Facebook
Facebook Platform
19. The Graph API presents a simple, consistent view of the
Facebook social graph, uniformly representing objects in the
graph (e.g.,people, photos, events, and pages) and the
connections between them (e.g., friend relationships,
shared content, and photo tags).
Restful API for accessing data on the Facebook graph.
Every object in the social graph has a unique ID. You can
access the properties of an object by requesting -
https://graph.facebook.com/ID
Alternatively, people and pages with usernames can be
accessed using their username as an ID. All responses are
JSON objects.
The Graph API
20. FBML is a variant-evolved subset of HTML with some elements
removed.
It allows Facebook Application developers to customize the "look
and feel" of their applications, to a limited extent.
It is the specification of how to encode content so that
Facebook's servers can read and publish it.
FBML plays an important role in building applications. FBML is used
to tap in to various Facebook elements when building applications.
It operates a lot like HTML and it gives the ability to do various tasks
with ease such as:
sending a user e-mail
embedding flash video
creating a dashboard
posting on a wall
Facebook Markup Language
21. Facebook also allows the use of regular HTML tags, such as <a
href=”#”></a>, which is used to generate a hyperlink. Facebook also allows
the use of many more HTML tags for building applications
FBML
22. The new Messages interweaves your chats, texts and emails.
It’s a central place to control all of your private
communication, both on and off Facebook.
Simply put, it can be a single inbox for all of your messages,
no matter how you choose to send them.
A facebook.com Email Address
SMS From Facebook
Chat History
Facebook’s New Messages
23. Facebook Connect is a set of APIs from Facebook that enable
Facebook members to log onto third-party websites,
applications, mobile devices and gaming systems with their
Facebook identity.
Facebook Connect
24. Unlike other social networks like Friendster, MySpace,
and Twitter – all of whom have run into serious scalability issues
at different points during their growth. Facebook has been mostly
reliable throughout its rise.
In actuality, Facebook uses JavaScript heavily, relies on their own
in-house PHP wrapper called XHP, HipHop (which optimizes
PHP), and many more technologies.
A lot of technologies have been developed by Facebook in-house
to serve their own needs, for example Cassandra
RELIABILITY
25. Thrift generates both the server and client interfaces for a given
service, and in a consistent manner. Client calls will be more
consistent
Related to above: Thrift's RPC-like behavior means that you get
type safety
Thrift supports various protocols, not just HTTP. If you are
dealing with large volumes of service calls, or have bandwidth
requirements, the client/server can transparently switch to more
efficient transports
Thrift is a mature piece of software; well tested and used.
Advantages of Thrift:
26. Thrift is poorly documented.
It is more work to get started on the client side, when the
clients are directly building the calling code. It's less work for
the service owner if they are building libraries for clientsYet
another dependency.
Disadvantages: