LinkedIn Graph Presentation

The Evolution of the Professional
Graph at LinkedIn

Chris Conrad Igor Perisic
Senior Engineering Manager, Sr. Director of Engineering, SNA
Social Graph

LinkedIn
•  The site officially launched on May 5, 2003. At the end of the first
month in operation, LinkedIn had a total of 4,500 members in the
network.
•  As of January 9, 2013, LinkedIn operates the world’s largest
professional network on the Internet with more than 200 million
members in over 200 countries and territories.
•  As of September 30, 2012, LinkedIn counts executives from all
2012 Fortune 500 companies as members; its corporate talent
solutions are used by 85 of the Fortune 100 companies.
•  As of the school year ending May 2012, there are over 20 million
students and recent college graduates on LinkedIn. They are
LinkedIn's fastest-growing demographic.

The Cloud
•  Cloud is the original name of our graph engine
•  Responsible for read scaling graph queries (and it used to do
search, too)
•  Stored 4 primary sets of data:

Cloud

Member Network
Data Cache

Group
Connections
Membership

What was wrong?
•  Large memory footprint
–  Network cache used simple but inefficient data structures
–  The size and density of the graph was increasing

•  Garbage Collector woes
–  Large JVM heap caused long GC pauses
–  Long GC pauses reduces availability resulting in site outages

C++ Graph
•  First project: migrate the network cache to a new data structure to
reduce memory usage
•  Second project: implement a C++ JNI library to move the graph
data off heap
•  Result: Drastic reduction in JVM heap utilization

Cloud

Java Heap libGraphJNI.so

Member Network
Data Cache
Connections

Group
Membership

New Problems
•  Growth
–  The size and density of the graph was increasing
–  We were running out of memory
–  We were running out of CPU cycles
–  Proliferation of services increased the overhead of maintaining client side
software load balancer
–  As of September 30, 2012, LinkedIn has 3,177 full-time employees located
around the world. LinkedIn started off 2012 with about 2,100 full-time
employees worldwide, up from around 1,000 at the beginning of 2011 and
about 500 at the beginning of 2010.

•  C++ code had a much higher maintenance cost
–  Coredumps are much less friendly than a NullPointerException
–  LinkedIn didn’thave the expertise or infrastructure to support C++
development

Split cloud
•  cloud-session: Move the load balancing logic into a service we
control
•  rgraph: Extract the C++ graph into its own service

cloud-session

Cloud rgraph

Java Heap libGraphJNI.so

Member Network
Data Cache
Connections

Group
Membership

New problems, same as the old
•  rgraph instances still had a large memory footprint
–  The density of the graph was increasing
–  We were running out of memory
–  We were running out of CPU cycles

•  cloud-session’s software load balancer implementation was
essentially a single point of failure

Distribute the Graph
•  Introduce Norbert a new cluster management system
•  Partition the graph data
•  Partition the network cache service

cloud-session
dgraph

Connections
Cloud

Java Heap Group
Membership
Member
Data

Network Cache
Service

What is the professional graph?
•  LinkedIn connections
•  Current and past co-workers
•  University colleagues and alumni
•  Group members
•  And what about geography, industry and skill overlap?

New requirements
•  Members aren’t the only type of node in the professional graph
•  LinkedIn connections aren’t the only type of edge in the
profession graph
•  We already supported groups and group membership

Making changes was hard
•  Code was rigid
–  Data was stored using class hierarchies, introducing data types was
prohibitively slow
–  Queries were built by combining object instances

•  BDBJE
•  Everything was back in the heap
–  Garbage collection time was starting to go up
–  GC pauses no longer caused outages, but flapping introduced high developer
and operational overhead

Graph as a Service
•  Custom persistence engine
–  Log structured
–  Memory mapped files keeps data out of the Java heap
–  Data described using DDL like schema

•  Custom SQL like query language
–  Query language understands DDL
–  Text based language reduces code changes

Graph Queries
•  Company(:id)[CompanyFollowers]

•  Member(:id)[MemberToMember{CreatedAt > :t}]

•  Member(:id)[topN(MemberToMember, Score, 10)]

What’s next?
•  Online schema migration
•  Automated repartitioning and data migration
•  Automated provisioning
•  Hierarchical data partitioning
•  Monitoring and statistics
•  Query optimization
•  Query fragment caching
•  Result set caching
•  Query parallelization
•  Very large data set handling
•  …

And we’re still growing

200M+ 2/sec
63% non U.S.

25th
Most visit website worldwide
90
(Comscore 6-12)

55 >2.6M
Company pages

85%
32

17
8
2 4 Fortune 100 Companies use
LinkedIn to hire
2004 2005 2006 2007 2008 2009 2010 2011
LinkedIn Members (Millions)

We’re Hiring
•  http://studentcareers.linkedin.com
•  Or email me at cconrad@linkedin.com

LinkedIn Graph Presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a LinkedIn Graph Presentation

Semelhante a LinkedIn Graph Presentation (20)

Mais de Amy W. Tang

Mais de Amy W. Tang (12)

Último

Último (20)

LinkedIn Graph Presentation