SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
The Evolution of the Professional
Graph at LinkedIn

        Chris Conrad                     Igor Perisic
Senior Engineering Manager,   Sr. Director of Engineering, SNA
        Social Graph
LinkedIn
•  The site officially launched on May 5, 2003. At the end of the first
   month in operation, LinkedIn had a total of 4,500 members in the
   network.
•  As of January 9, 2013, LinkedIn operates the world’s largest
   professional network on the Internet with more than 200 million
   members in over 200 countries and territories.
•  As of September 30, 2012, LinkedIn counts executives from all
   2012 Fortune 500 companies as members; its corporate talent
   solutions are used by 85 of the Fortune 100 companies.
•  As of the school year ending May 2012, there are over 20 million
   students and recent college graduates on LinkedIn. They are
   LinkedIn's fastest-growing demographic.
In the beginning…
The Cloud
•  Cloud is the original name of our graph engine
•  Responsible for read scaling graph queries (and it used to do
   search, too)
•  Stored 4 primary sets of data:


                       Cloud

                         Member         Network
                          Data           Cache



                                        Group
                        Connections
                                      Membership
What was wrong?
•  Large memory footprint
   –  Network cache used simple but inefficient data structures
   –  The size and density of the graph was increasing

•  Garbage Collector woes
   –  Large JVM heap caused long GC pauses
   –  Long GC pauses reduces availability resulting in site outages
C++ Graph
•  First project: migrate the network cache to a new data structure to
   reduce memory usage
•  Second project: implement a C++ JNI library to move the graph
   data off heap
•  Result: Drastic reduction in JVM heap utilization


            Cloud

                         Java Heap            libGraphJNI.so

                Member           Network
                 Data             Cache
                                              Connections

                        Group
                      Membership
Several million users later
New Problems
•  Growth
   –  The size and density of the graph was increasing
   –  We were running out of memory
   –  We were running out of CPU cycles
   –  Proliferation of services increased the overhead of maintaining client side
      software load balancer
   –  As of September 30, 2012, LinkedIn has 3,177 full-time employees located
      around the world. LinkedIn started off 2012 with about 2,100 full-time
      employees worldwide, up from around 1,000 at the beginning of 2011 and
      about 500 at the beginning of 2010.

•  C++ code had a much higher maintenance cost
   –  Coredumps are much less friendly than a NullPointerException
   –  LinkedIn didn’thave the expertise or infrastructure to support C++
      development
Split cloud
•  cloud-session: Move the load balancing logic into a service we
   control
•  rgraph: Extract the C++ graph into its own service


                 cloud-session




         Cloud                              rgraph

                      Java Heap                  libGraphJNI.so

             Member           Network
              Data             Cache
                                                     Connections

                    Group
                  Membership
New problems, same as the old
•  rgraph instances still had a large memory footprint
    –  The density of the graph was increasing
    –  We were running out of memory
    –  We were running out of CPU cycles

•  cloud-session’s software load balancer implementation was
   essentially a single point of failure
Distribute the Graph
•  Introduce Norbert a new cluster management system
•  Partition the graph data
•  Partition the network cache service

                cloud-session
                                             dgraph


                                              Connections
              Cloud

               Java Heap                        Group
                                              Membership
                  Member
                   Data




                                  Network Cache
                                     Service
Mission Accomplished
So now what?
My Connections
Common Connections
My Network
How am I connected?
What is the professional graph?
•  LinkedIn connections
•  Current and past co-workers
•  University colleagues and alumni
•  Group members
•  And what about geography, industry and skill overlap?
New requirements
•  Members aren’t the only type of node in the professional graph
•  LinkedIn connections aren’t the only type of edge in the
   profession graph
•  We already supported groups and group membership
Making changes was hard
•  Code was rigid
   –  Data was stored using class hierarchies, introducing data types was
      prohibitively slow
   –  Queries were built by combining object instances

•  BDBJE
•  Everything was back in the heap
   –  Garbage collection time was starting to go up
   –  GC pauses no longer caused outages, but flapping introduced high developer
      and operational overhead
Graph as a Service
•  Custom persistence engine
   –  Log structured
   –  Memory mapped files keeps data out of the Java heap
   –  Data described using DDL like schema

•  Custom SQL like query language
   –  Query language understands DDL
   –  Text based language reduces code changes
Graph Queries
•  Company(:id)[CompanyFollowers]


•  Member(:id)[MemberToMember{CreatedAt > :t}]


•  Member(:id)[topN(MemberToMember, Score, 10)]
What do we have in common?
How am I connected?
What’s next?
•  Online schema migration
•  Automated repartitioning and data migration
•  Automated provisioning
•  Hierarchical data partitioning
•  Monitoring and statistics
•  Query optimization
•  Query fragment caching
•  Result set caching
•  Query parallelization
•  Very large data set handling
•  …
And we’re still growing

                                     200M+            2/sec
                                      63% non U.S.


                                                     25th
                                                     Most visit website worldwide
                               90
                                                     (Comscore 6-12)



                          55                         >2.6M
                                                     Company pages



                                                     85%
                    32

               17
           8
 2    4                                              Fortune 100 Companies use
                                                     LinkedIn to hire
2004 2005 2006 2007 2008 2009 2010 2011
          LinkedIn Members (Millions)
We’re Hiring
•  http://studentcareers.linkedin.com
•  Or email me at cconrad@linkedin.com
Q&A

Mais conteúdo relacionado

Mais procurados

Responsive Web Design ~ Best Practices for Maximizing ROI
Responsive Web Design ~ Best Practices for Maximizing ROIResponsive Web Design ~ Best Practices for Maximizing ROI
Responsive Web Design ~ Best Practices for Maximizing ROIJuan Carlos Duron
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRMCatherine Eibner
 
SPEVO13 - IW509 - Records Management and Search
SPEVO13 - IW509 - Records Management and SearchSPEVO13 - IW509 - Records Management and Search
SPEVO13 - IW509 - Records Management and SearchJohn F. Holliday
 
A Focus on Salesforce1 Platform: Customizing and Multi-org Architecture
A Focus on Salesforce1 Platform: Customizing and Multi-org ArchitectureA Focus on Salesforce1 Platform: Customizing and Multi-org Architecture
A Focus on Salesforce1 Platform: Customizing and Multi-org ArchitectureSalesforce.org
 
What SQL DBAs need to know about SharePoint-Indianapolis 2013
What SQL DBAs need to know about SharePoint-Indianapolis 2013What SQL DBAs need to know about SharePoint-Indianapolis 2013
What SQL DBAs need to know about SharePoint-Indianapolis 2013J.D. Wade
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...European SharePoint Conference
 
Deep Web: Databases on the Web
Deep Web: Databases on the WebDeep Web: Databases on the Web
Deep Web: Databases on the WebDenis Shestakov
 
From Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case studyFrom Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case studyElizabeth Szabo
 
Logical architecture considerations for SharePoint 2013
Logical architecture considerations for SharePoint 2013Logical architecture considerations for SharePoint 2013
Logical architecture considerations for SharePoint 2013Dinusha Kumarasiri
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Itay Braun
 
System Update (2011 CrossRef Workshops)
System Update (2011 CrossRef Workshops)System Update (2011 CrossRef Workshops)
System Update (2011 CrossRef Workshops)Crossref
 
SharePoint 2013 Platform Options - office 365, Azure, On premise
SharePoint 2013 Platform Options - office 365, Azure, On premiseSharePoint 2013 Platform Options - office 365, Azure, On premise
SharePoint 2013 Platform Options - office 365, Azure, On premiseDavid J Rosenthal
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationKenneth Peeples
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableDenodo
 
Enterprise Content Management Migration Best Practices Feat Migrations From...
Enterprise Content Management Migration Best Practices   Feat Migrations From...Enterprise Content Management Migration Best Practices   Feat Migrations From...
Enterprise Content Management Migration Best Practices Feat Migrations From...Alfresco Software
 

Mais procurados (20)

Responsive Web Design ~ Best Practices for Maximizing ROI
Responsive Web Design ~ Best Practices for Maximizing ROIResponsive Web Design ~ Best Practices for Maximizing ROI
Responsive Web Design ~ Best Practices for Maximizing ROI
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRM
 
expressor Studio 3.0
expressor Studio 3.0expressor Studio 3.0
expressor Studio 3.0
 
SPEVO13 - IW509 - Records Management and Search
SPEVO13 - IW509 - Records Management and SearchSPEVO13 - IW509 - Records Management and Search
SPEVO13 - IW509 - Records Management and Search
 
A Focus on Salesforce1 Platform: Customizing and Multi-org Architecture
A Focus on Salesforce1 Platform: Customizing and Multi-org ArchitectureA Focus on Salesforce1 Platform: Customizing and Multi-org Architecture
A Focus on Salesforce1 Platform: Customizing and Multi-org Architecture
 
What SQL DBAs need to know about SharePoint-Indianapolis 2013
What SQL DBAs need to know about SharePoint-Indianapolis 2013What SQL DBAs need to know about SharePoint-Indianapolis 2013
What SQL DBAs need to know about SharePoint-Indianapolis 2013
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Deep Web: Databases on the Web
Deep Web: Databases on the WebDeep Web: Databases on the Web
Deep Web: Databases on the Web
 
From Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case studyFrom Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case study
 
Bots & Teams: el poder de Grayskull
Bots & Teams: el poder de GrayskullBots & Teams: el poder de Grayskull
Bots & Teams: el poder de Grayskull
 
Power BI Interview Questions
Power BI Interview QuestionsPower BI Interview Questions
Power BI Interview Questions
 
Logical architecture considerations for SharePoint 2013
Logical architecture considerations for SharePoint 2013Logical architecture considerations for SharePoint 2013
Logical architecture considerations for SharePoint 2013
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
 
System Update (2011 CrossRef Workshops)
System Update (2011 CrossRef Workshops)System Update (2011 CrossRef Workshops)
System Update (2011 CrossRef Workshops)
 
SharePoint 2013 Platform Options - office 365, Azure, On premise
SharePoint 2013 Platform Options - office 365, Azure, On premiseSharePoint 2013 Platform Options - office 365, Azure, On premise
SharePoint 2013 Platform Options - office 365, Azure, On premise
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
 
Enterprise Content Management Migration Best Practices Feat Migrations From...
Enterprise Content Management Migration Best Practices   Feat Migrations From...Enterprise Content Management Migration Best Practices   Feat Migrations From...
Enterprise Content Management Migration Best Practices Feat Migrations From...
 

Semelhante a LinkedIn Graph Presentation

Datastax - Why Your RDBMS fails at scale
Datastax - Why Your RDBMS fails at scaleDatastax - Why Your RDBMS fails at scale
Datastax - Why Your RDBMS fails at scaleRuth Mills
 
Scaling LinkedIn - A Brief History
Scaling LinkedIn - A Brief HistoryScaling LinkedIn - A Brief History
Scaling LinkedIn - A Brief HistoryJosh Clemm
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Tugdual Grall
 
Cloud Computing and Virtualization Overview by Amr Ali
Cloud Computing and Virtualization Overview by Amr AliCloud Computing and Virtualization Overview by Amr Ali
Cloud Computing and Virtualization Overview by Amr AliAmr Ali
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceDeepak Chandramouli
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduceGareth Rogers
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoHugo Aquino
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreEvan Rodd
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 

Semelhante a LinkedIn Graph Presentation (20)

Datastax - Why Your RDBMS fails at scale
Datastax - Why Your RDBMS fails at scaleDatastax - Why Your RDBMS fails at scale
Datastax - Why Your RDBMS fails at scale
 
Scaling LinkedIn - A Brief History
Scaling LinkedIn - A Brief HistoryScaling LinkedIn - A Brief History
Scaling LinkedIn - A Brief History
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
Cloud Computing and Virtualization Overview by Amr Ali
Cloud Computing and Virtualization Overview by Amr AliCloud Computing and Virtualization Overview by Amr Ali
Cloud Computing and Virtualization Overview by Amr Ali
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Clouds
CloudsClouds
Clouds
 
Clouds
CloudsClouds
Clouds
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
Making the Cloud a Known Entity
Making the Cloud a Known EntityMaking the Cloud a Known Entity
Making the Cloud a Known Entity
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
A Mashup with Backbone
A Mashup with BackboneA Mashup with Backbone
A Mashup with Backbone
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Mais de Amy W. Tang

Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedInAmy W. Tang
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
 
Building Distributed Systems Using Helix
Building Distributed Systems Using HelixBuilding Distributed Systems Using Helix
Building Distributed Systems Using HelixAmy W. Tang
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Amy W. Tang
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesAmy W. Tang
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with HelixAmy W. Tang
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the DatabusAmy W. Tang
 
Introduction to Databus
Introduction to DatabusIntroduction to Databus
Introduction to DatabusAmy W. Tang
 
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInA Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInAmy W. Tang
 

Mais de Amy W. Tang (12)

Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 
Building Distributed Systems Using Helix
Building Distributed Systems Using HelixBuilding Distributed Systems Using Helix
Building Distributed Systems Using Helix
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
Introduction to Databus
Introduction to DatabusIntroduction to Databus
Introduction to Databus
 
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInA Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

LinkedIn Graph Presentation

  • 1. The Evolution of the Professional Graph at LinkedIn Chris Conrad Igor Perisic Senior Engineering Manager, Sr. Director of Engineering, SNA Social Graph
  • 2. LinkedIn •  The site officially launched on May 5, 2003. At the end of the first month in operation, LinkedIn had a total of 4,500 members in the network. •  As of January 9, 2013, LinkedIn operates the world’s largest professional network on the Internet with more than 200 million members in over 200 countries and territories. •  As of September 30, 2012, LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 85 of the Fortune 100 companies. •  As of the school year ending May 2012, there are over 20 million students and recent college graduates on LinkedIn. They are LinkedIn's fastest-growing demographic.
  • 4. The Cloud •  Cloud is the original name of our graph engine •  Responsible for read scaling graph queries (and it used to do search, too) •  Stored 4 primary sets of data: Cloud Member Network Data Cache Group Connections Membership
  • 5. What was wrong? •  Large memory footprint –  Network cache used simple but inefficient data structures –  The size and density of the graph was increasing •  Garbage Collector woes –  Large JVM heap caused long GC pauses –  Long GC pauses reduces availability resulting in site outages
  • 6. C++ Graph •  First project: migrate the network cache to a new data structure to reduce memory usage •  Second project: implement a C++ JNI library to move the graph data off heap •  Result: Drastic reduction in JVM heap utilization Cloud Java Heap libGraphJNI.so Member Network Data Cache Connections Group Membership
  • 8. New Problems •  Growth –  The size and density of the graph was increasing –  We were running out of memory –  We were running out of CPU cycles –  Proliferation of services increased the overhead of maintaining client side software load balancer –  As of September 30, 2012, LinkedIn has 3,177 full-time employees located around the world. LinkedIn started off 2012 with about 2,100 full-time employees worldwide, up from around 1,000 at the beginning of 2011 and about 500 at the beginning of 2010. •  C++ code had a much higher maintenance cost –  Coredumps are much less friendly than a NullPointerException –  LinkedIn didn’thave the expertise or infrastructure to support C++ development
  • 9. Split cloud •  cloud-session: Move the load balancing logic into a service we control •  rgraph: Extract the C++ graph into its own service cloud-session Cloud rgraph Java Heap libGraphJNI.so Member Network Data Cache Connections Group Membership
  • 10. New problems, same as the old •  rgraph instances still had a large memory footprint –  The density of the graph was increasing –  We were running out of memory –  We were running out of CPU cycles •  cloud-session’s software load balancer implementation was essentially a single point of failure
  • 11. Distribute the Graph •  Introduce Norbert a new cluster management system •  Partition the graph data •  Partition the network cache service cloud-session dgraph Connections Cloud Java Heap Group Membership Member Data Network Cache Service
  • 17. How am I connected?
  • 18. What is the professional graph? •  LinkedIn connections •  Current and past co-workers •  University colleagues and alumni •  Group members •  And what about geography, industry and skill overlap?
  • 19. New requirements •  Members aren’t the only type of node in the professional graph •  LinkedIn connections aren’t the only type of edge in the profession graph •  We already supported groups and group membership
  • 20. Making changes was hard •  Code was rigid –  Data was stored using class hierarchies, introducing data types was prohibitively slow –  Queries were built by combining object instances •  BDBJE •  Everything was back in the heap –  Garbage collection time was starting to go up –  GC pauses no longer caused outages, but flapping introduced high developer and operational overhead
  • 21. Graph as a Service •  Custom persistence engine –  Log structured –  Memory mapped files keeps data out of the Java heap –  Data described using DDL like schema •  Custom SQL like query language –  Query language understands DDL –  Text based language reduces code changes
  • 22. Graph Queries •  Company(:id)[CompanyFollowers] •  Member(:id)[MemberToMember{CreatedAt > :t}] •  Member(:id)[topN(MemberToMember, Score, 10)]
  • 23. What do we have in common?
  • 24. How am I connected?
  • 25. What’s next? •  Online schema migration •  Automated repartitioning and data migration •  Automated provisioning •  Hierarchical data partitioning •  Monitoring and statistics •  Query optimization •  Query fragment caching •  Result set caching •  Query parallelization •  Very large data set handling •  …
  • 26. And we’re still growing 200M+ 2/sec 63% non U.S. 25th Most visit website worldwide 90 (Comscore 6-12) 55 >2.6M Company pages 85% 32 17 8 2 4 Fortune 100 Companies use LinkedIn to hire 2004 2005 2006 2007 2008 2009 2010 2011 LinkedIn Members (Millions)
  • 28. Q&A