Facebook is a social networking website where users can post comments, share photographs and post links to news or other interesting content on the web, chat live, and watch short-form video. You can even order food on Facebook if that's what you want to do. Shared content can be made publicly accessible, or it can be shared only among a select group of friends or family, or with a single person
2. THE “SOCIAL MEDIA” REVOLUTION A STUDY AND ANALYSIS OF THE
PHENOMENON
Ahmad Yar
BS Computer Science
Bahauddin Zakariya University Multan (BZU), Sahiwal Campus.
Email: ahmadyark1@gmail.com
Mobile: +92303 9464551
3. What are Distributed Systems ?
A distributed system is one in which hardware or software components located
at networked computers.
A distributed system is a piece of software that ensures that :
A collection of independent computers appears to its users as a single
coherent system.
Two aspects:
Independent computers
Single system
World Wide Web (WWW) is the biggest example of distributed system.
4. What is Facebook?
A portal for social networking
Interact with friends
Share photos and/or videos
Community organizing
Email and instant messaging
Various forms of interpersonal communication
Operated and privately owned by Facebook, Inc.
5. Who Created Facebook?
Mark Zuckerberg created Facebook while at Harvard University in
2004 with roommate Dustin and fellow Computer Science major Chris.
Initially created for college students
Then moved to include high school students
Now open to anyone over the age of 13
Mark Zuckerberg, 23, founded Facebook while studying psychology at Harvard University.
A keen computer programmer, Mr Zuckerberg had already developed a number of social-
networking websites.
Coursematch
Facemash
6. Idea & Creation of Facebook
Divya Narendra
Cameron and Tyler Winklevoss
In February 2004 Mr Zuckerberg launched "The facebook", as it was originally known; the
name taken from the sheets of paper distributed to freshmen, profiling students and staff.
Within 24 hours, 1,200 Harvard students had signed up, and after one month, over half of
the undergraduate population had a profile.
The network was promptly extended to other Boston universities, the Ivy League and
eventually all US universities. It became Facebook.com in August 2005 after the address
was purchased for $200,000. US high schools could sign up from September 2005, then
it began to spread worldwide, reaching UK universities the following month.
7. Social Network Feb. 2008 Feb. 2009 Growth
Facebook 20,043,000 65,704,000 +228%
Growth of Facebook
8. Facebook Architecture
Front End
LAMP:
Linux, Apache, MYSQL, PHP & Bigpipe
Great Documentation
Large Community
Why LAMP?
Easy to learn, huge community, lots of
Framework used by Facebook
9. • LINUX is a computer operating system kernel.
• It’s open source, very customizable, and good for security.
• Facebook runs the Linux operating system on Apache HTTP Servers.
In many ways, linux is similar to other operating systems you may have used before,
such as windows, osx, or ios.
Like other operating systems, linux has a graphical interface, and types of software
you are accustomed to using on other operating systems, such as word processing
applications, have linux equivalents. In many cases, the software’s creator may have
made a linux version of the same program you use on other systems. If you can use
a computer or other electronic device, you can use linux.
Linux & Apache
10. • APACHE is also free and is the most popular open source
webserver in use.
Facebook messaging system has recently added to the application, by the support of
Apache HBase which is a database like layer built on Hadoop designed to support
billions of messages per day. The application’s requirements for consistency, availability,
partition tolerance, data model and scalability.
Hbase support Facebook billion messages capacity which will be increased with minimal
overhead and no down time, with Highly write throughput, efficient and low-latency that
support the strong consistency semantics within a data center, the efficient random
reads from disks, and being highly available specially in disaster recovery, and fault
isolation ,and retaining the atomic read modify write primitives .
Linux & Apache
11. • PHP is a dynamically typed/interpreted scripting language.
• Facebook uses PHP because it is a good web programming
Language with extensive support and an active developer
community and it is good for rapid iteration.
The facebook sdk (system development kit) for php is a library with powerful features
that enable php developers to easily integrate facebook login and make requests to
the graph API.
It also plays well with the facebook sdk for javascript to give the front-end user the
best possible user experience. But it doesn't end there, the facebook sdk for php
makes it easy to upload photos and videos and send batch requests to the graph API
among other things.
PHP & Bigpipe
12. Pipelining
• Bigpipe is a dynamic web page system developed by Facebook.
The general idea is to perform pipelining of sections through the
implementation of various stages within web browsers and servers.
Browser sends an http request to web server.
Web server parses the request, pulls data from storage tier then formulates
an html document and sends it to the client in an http response.
Http response is transferred over the internet to browser.
Browser parses the response from web server, constructs a tree
representation of the html document, and downloads
css and javascript resources referenced by the document.
After downloading javascript resources,
browser parses and executes them.
BigPipe
13. HIP-HOP
• PHP compiler.
• Developed by Facebook
• The processing time for PHP language is slow Created to minimize server
resources.
• Converts PHP scripts into optimized C++ code.
14. • Back end are the application servers.
• Application servers are responsible for answering all queries and take all the writes
into the system.
• Facebook’s backend services are written in a variety of different programming
languages including C++, Java, Python, and Erlang.
Back End
• Haystack
• SCRIBE
• My SQL
• Memcache
• Cassandra
• Storing
15. Haystack
• Haystack is an object store that is designed for
sharing photos on Facebook where data is
written once, read often, never modified, and
rarely deleted and replaced.
• Efficient storage of billions of photos.
• Highly scalable.
• Uses extensive caching in its main memory.
The new photo infrastructure merges the photo serving tier and storage tier into one
physical tier. It implements a HTTP based photo server which stores photos in a
generic object store called Haystack. The main requirement for the new tier was to
eliminate any unnecessary metadata overhead for photo read operations, so that
each read I/O operation was only reading actual photo data (instead of file system
metadata). Haystack can be broken down into these functional layers:
16. SCRIBE
• Simple data model
• Scalable distributed logging framework
• Useful for logging a wide array of data
• Built on top of Thrift
• HTTP server
• Photo Store
• Haystack Object Store
• File system
• Storage
17. SCRIBE
Scribe is a server for aggregating log data streamed in real-time from a large
number of servers. It was designed to be scalable, extensible without client-side
modification, and robust to failure of the network or any specific machine.
Scribe is developed at facebook and released in 2008 as open source. Scribe servers are arranged in a
directed graph, with each server knowing only about the next server in the graph. This network topology
allows for adding extra layers of fan-in as a system grows, and batching messages before sending them
between datacenters, without having any code that explicitly needs to understand datacenter topology,
only a simple configuration.
Scribe is designed to consider reliability but to not require heavyweight protocols and expansive disk
usage. Scribe spools data to disk on any node to handle intermittent connectivity node failure, but doesn't
sync a log file for every message. This creates a possibility of a small amount of data loss in the event of
a crash or catastrophic hardware failure. However, this degree of reliability is often suitable for most
facebook use cases.
18. • Facebook utilizes MySQL because of its speed and reliability.
• Thousands of MySQL servers
• Users randomly distributed across these servers
• Relational aspect of DB is not used
• No joins. Logically difficult(Data is distributed randomly)
• Primarily key-value store
Memcache
• Protects the main database from high read demands
from users.
• Memcache is a memory caching system that is used
to speed up dynamic database driven websites (like
Facebook)
Memory Management using Memcached
My SQL
19. Cassandra is a database management system designed to
handle large amounts of data spread out across many
servers. It powers Facebook’s Inbox Search feature and
provides a structured key-value store with eventual
consistency.
Storing
Apache Hadoop is being used in three broad types of systems:
• as a warehouse for web analytics
• as storage for a distributed database
• and for MySQL database backups.
Cassandra
Cassandra
20. Fault Tolerance
Ability of a system to continue functioning in the event of a partial failure.
Though the system continues to function but overall performance may get affected.
Two main reasons for the occurrence of a fault :
1)Hardware or software failure. 2)Unauthorized Access.
Why do we need fault tolerance
Fault Tolerance is needed in order to provide 3 main feature to distributed systems.
1) Reliability-Focuses on a continuous service with out any interruptions.
2) Availability - Concerned with read readiness of the system.
3) Security-Prevents any unauthorized access.
21. Phases In Fault Tolerance
• Implementation of a fault tolerance technique depends on the design , configuration
and application of a distributed system.
• In general designers have suggested some general principles which have been followed.
1)Fault Detection
2)Fault Diagnosis
3)Evidence Generation
4)Assessment
5)Recovery
22. Fault
Detection
•Constantly monitoring the performance and comparing it with
expected outcome.
•Fault is reported if there is a deviation from expected
outcome.
Fault
Diagnosis
•Done to understand the nature of the fault and possible root
cause.
Evidence
Generation
•Report generated based on the outcome of the fault diagnosis.
Assessment •Understanding the extent of the damage caused by the faulty
component.
•Done by examining the flow of information that has passed out
from the faulty component to the rest of the system.
•A virtual Boundary is created.
Recovery Making the system fault free and restoring it to a consistent
state- Forward recovery and Backward recovery.
23. Fault Tolerance Techniques
Replication
• Creating multiple copies or replica of data items and storing them at different sites
• Main idea is to increase the availability so that if a node fails at one site, so data can
be accessed from a different site.
• Has its limitation too such as data consistency and degree of replica.
24. LIMITATIONS
Replication
• Difficult to manage as the no. replica or copies increases.
• Consistency and degree of replica is a major issue.
Check Pointing
• Lost of computation
• Check point length and check point frequency and storage is a major issue.
25. • A situation in which two or more persons access the same record at same time is
called Concurrency.
• Concurrency control ensures that correct results of parallel operations are generated.
Concurrency
Why concurrency control?
• Concurrency control is needed because there are a lot of things that can go wrong
• Each transaction itself can be okay, but the concurrency generates problems such as:
• The lost update problem
• The dirty read problem
• The incorrect summary problem
26. • Facebook has worked hard on concurrent programming. Now, Facebook is sharing its
newest debugger tool: RacerD, its new open source race detector.
• RacerD is launched by the company in 2015.
• Dedicated to identifying source code bugs.
• RacerD statically analyzes Java code to detect potential concurrency bugs. This
analysis does not attempt to prove the absence of concurrency issues, rather, it
searches for a high-confidence class of data races.
• RacerD doesn’t try to check all code for concurrency issues.
• There are two signals that RacerD looks for:
1. Explicitly annotating a class/method
2. Using a lock via the synchronized keyword.
RacerD
27. • Scalability is an attribute that describes the ability of a process, network, software or
organization to grow and manage increased demand. A system, business or software
that is described as scalable has an advantage because it is more adaptable to the
changing needs or demands of its users or clients.
Scalability
28. Facebook’s scaling challenge
Before we get into the details, here are a few factoids to give you an idea of the scaling challenge
that Facebook has to deal with:
Facebook serves 570 billion page views per month (according to Google Ad Planner).
There are more photos on Facebook than all other photo sites combined (including sites like
Flickr) More than 3 billion photos are uploaded every month.
Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images
served by Facebook’s CDN. More than 25 billion pieces of content (status updates, comments,
etc) are shared every month. Facebook has more than 30,000 servers (and this number is from
last year)
1-LAMP 2-PHP 3-Linux 4-MySQL 5-Memcached
6-HIPHOP 7-HAYSTACK 8-BIGPIPE 9-CASSANDRA 10-SCRIBE
11-HADOOP & HIVE 12-THRIFT
Software that helps Facebook scale
29. Here’s a look at Facebook’s rapidly growing data center campuses around the
world:
Prineville, Oregon : 2.15 million square feet of data center space in Prineville by
2021.
Altoona, Iowa : 2.5 million square feet of data center space.
The campus features three data centers between 468,000 SF and 496,000 SF. In 2016
the company added a 100,000 SF cold storage facility
Clonee, Ireland : 621,000 square feet of data center space.
Forth Worth, Texas : 2.5 million square feet of data center space.
30. Las Lunas, New Mexico : Sept. 2016 nearly 3 million square feet of data
center space.
Papillion, Nebraska : In March 2018 2.6 million square feet of space.
New Albany, Ohio : Facebook investing $750 million in a 900,000 square foot data
center in New Albany, an Ohio town that also hosts a cloud computing data center for
Amazon Web Services.
Henrico County, Virginia : Facebook spend $750 million to build a 970,000 square
foot data center.
Newton County, Georgia : In February 2017 Facebook invest about $750 million in
the facility in Newton County, about 40 miles east of downtown Atlanta, where it build
two data centers spanning 970,000 square feet. The buildings will be fully operational
in 2020.
31. Openness
Openness means being open in terms of sharing information so employees
know what’s going on, and crucially, feel heard. But it also means being, and
expecting, an openness to different ways of working different styles, different
opinions, and, critically, feedback. It means openness to change.
Whether the system can be extended in various ways without troublesome
existing system and services
• Hardware extensions
• Adding peripherals, memory, communication interfaces
• Software extensions
• Operating System features
• Communication protocols
32. Openness is supported by:
• Public interfaces
• Standardized communication protocols
1.Be Personal:
Don’t try to be something you’re not, or someone
else. Be yourself. Just be yourself. That includes being
vulnerable, honest. If something isn’t working, or is
worrying you, share it. If you’ve struggled with
something that’s relevant and learned a lesson or
two along the way, share it. Sharing your own
perspective on an event, a trend, or a challenge
makes you more relatable and builds trust.
Share a story.“We can tag others and it is a much more elegant
way to have a conversation, versus the email
conversations that we were having a lot of times.” Stacie Sherer, SVP Corporate
Communications, Weight Watchers.
Openness key aspects
33. 2. Internal before external:
Just about everything should be shared internally before it’s shared externally. It gives us
the opportunity to get feedback, prepare for public feedback, and to refine and practice
our broader messages before going to the public.
3. Feedback:
Root your programs in feedback and use data to support wherever possible. Often the
feedback helps you figure out what point you’re trying to make. And be clear about what
kind of feedback you want, where and how you want it shared, and what you’ve learned
or what changes you’ve made from the feedback.
Feedback also helps all people get better together. Without it, people can see the
problems and become complacent or jaded if they don’t think their opinion matters or that
their insight can make a difference.
34. Transparency
Concealment (Hiding) from the user and the application programmer
of the separation of the components of a distributed system
Access Transparency - Local and remote resources are accessed in
same way
Location Transparency - Users are unaware of the location of
resources
Migration Transparency - Resources can migrate without name
change
Replication(something that has been copied) Transparency -
Users are unaware of the existence of multiple copies of resources
Failure Transparency - Users are unaware of the failure of individual
components
Concurrency Transparency - Users are unaware of sharing
resources with others
35. Facebook released its latest Transparency report, where the social network
shares information on government requests for user data, noting that these
requests had increased globally by around 4 percent compared to the first half of
2017, though U.S. government-initiated requests stayed roughly the same. In
addition, the company added a new report to accompany the usual Transparency
report, focused on detailing how and why Facebook takes action on enforcing its
Community Standards, specifically in the areas of graphic violence, and sexual
activity, terrorist propaganda, hate speech, spam and fake accounts.
Including that facts this is very much a work in progress and they will likely
improve their methodology over time.
Government requests for account data increased globally by around 4%
compared to the first half of 2017, increasing from 78,890 to 82,341 requests. In
the US, government requests remained roughly even at 32,742, of which 62%
included a non-disclosure order prohibiting Facebook from notifying the user,
which is up from 57% during the first half of 2017.
36. During the second half of 2017, the number of pieces of content we restricted
based on local law fell from 28,036 to 14,294. Last cycle’s figures had been
increased primarily by content restrictions in Mexico related to the video of a
tragic school shooting.
There were 46 disruptions of Facebook services in 12 countries in the second
half of 2017, compared to 52 disruptions in nine countries in the first half. We
continue to be deeply concerned by internet disruptions, which prevent people
from communicating with family and friends and also threaten the growth of small
businesses.
The report also includes data covering the volume and nature of copyright,
trademark and counterfeit reports we received, as well as the amount of content
affected by those reports. During this period, on Facebook and Instagram we
took down 2,776,665 pieces of content based on 373,934 copyright reports,
222,226 pieces of content based on 61,172 trademark reports and 459,176
pieces of content based on 28,680 counterfeit reports.