SDEC2011 Implementing me2day friend suggestion

Implementing
me2day Friend Suggestion

NHN
미투데이인프라개발팀
강호성

1

Implementing
me2day Friend Suggestion

NHN
미투데이인프라개발팀
강호성

2

Contents

• SNS에서의 친구추천
• 친구추천 알고리즘
• 친구추천 시스템의 구현
• 결롞 및 향후 작업

3

가입했는데
친구 없어서 심심해..
나도 미친 좀 사귀어보자!

불쌍하다고 4명 싞청해
주고..
그마저도 홍보미투ㅠㅠ
허이고.. 불쌍하다 불쌍해! 옛다!
미투 한 개 찍어줌.

6

SNS에서의 친구추천

• 사용자에게 새로운 친구를 사귈 수 있도록 도와주는 장치
• 친구수가 적은 유저에게 유용함

친구추천이 뜨네
SNS = 블로그?
친구싞청 해볼까?
이제야 SNS의 묘미를
알겠다~

7

me2day에서의 친구추천

8

me2day에서의 친구추천

추천하는 사람들

공통된 친구들

9

Next Contents


• Friend of a Friend 추천
• Close Friend of a Close Friend 추천

10

Friend of a Friend (FOAF) 추천

• F(A, B) = A, B가 친구관계이면 True
• M.F.C(A, C) = A, C의 공통친구의 수 (Mutual Friend Count)
• A의 추천친구 집합 = F(A, B) 이고 F(B, C)인 C들의 집합
• 각각의 C마다 M.F.C(A, C)가 높은 순으로 정렬함

First

친구1 아이유

갈길이멀다 친구2 직장동료

친구3 팀장님
※ Jilin Chen, (2009), “Recommending People on Social Networking Sites”.
11



친구1 아이유


Second

친구3 팀장님
12



친구1 아이유

Third


친구3 팀장님
13

Close Friend of a Close Friend (CFOACF) 추천

• FOAF 추천 알고리즘에 친밀도를 추가한 알고리즘
• I(A, B) = A  B의 친밀도 (A에서 B로의 소통량을 수치로 표시)

Score of C = I(A, Bi) * I(Bi, C)

2
9
1
친구1 1 아이유
1
5

2

친구3 팀장님
14




친구1 아이유

5

갈길이멀다 친구2 2 직장동료
2
16
3
친구3 팀장님
15




친구1 아이유

5 4 20


친구3 팀장님
16


• 친밀도 값의 산정 프로세스

관심친구, 댓글,
미투, 친구 맺은 기간

Feedback !!

17

Next Contents


• Social Network에 적합한 데이터 모델링
• Graph Product의 고려
• 친구수가 많은 사용자로 인한 성능저하 극복
• 친구추천 시스템의 확장성 문제

18

Social Network에 적합한 데이터 모델링

• Social Network 모델링 기법 비교
<Social Network>
5 3

<Relational Model>
user_no friend_no 1 8 2
1 5
1 9
9 6 <Graph Model>
1 8
2 3 User Nodes
2 8 1 5-9-8
3 5 2 3-8
3 8 3 5-8-9-2 List of
3 9 5 8-1-3 Friend
3 2 6 9-8
. . 8 1-5-3-2-6
. . 9 1-3-6
. .
9 6

19


• Relational Model의 쿼리 방법

<User, Friend>

Join
or
재쿼리

20


• Graph Model의 쿼리 방법

Just Tracking Reference Pointer !

21


• 실험: 25만 명의 친구를 가진 사용자의 “친구의 친구” 구하기

• Result (Count : 9,060,712)

Relational Model Graph Model
Response
5.9 sec 40X 0.15 sec
Time

22

Graph Product의 고려

• Graph Products Features
Comparisons Graph Framework Graph Database

Data Durability Medium-Low Medium-High

Cache hit-ratio 100 % Depend on Workload

Suitable Workload Batch Job Real-time Job

TinkerPop’s TinkerGraph Neo Tech’s Neo4j
Products Google’s Pregel Twitter’s FlockDB
MS’s Trinity Orient Tech’s OrientDB

※ Google, Inc, (2010), “Pregel : A System for Large-Scale Graph Processing”
“http://www.tinkerpop.com/”, “http://www.graph-database.org/”
23


• Graph Products Evaluation

High
FlockDB Pregel

Neo4j
OrientDB
Availability
TinkerGraph

Low

Low High
Performance
※ “http://markorodriguez.com”, “http://www.orientechnologies.com/”
24


• TinkerGraph의 구현방식을 참고
• Availability 향상을 위해 Replication Failover 기능 구현
Stand-by
Master Batch Server
Batch Server

Friendships

Suggestion Result
25

친구수가 많은 사용자로 인한 성능저하 극복

• 친구수가 많은 사용자
• 본인과 친구들의 추천 연산의 성능을 저하 시킴
• 이러한 사용자의 친구수는 급격한 속도로 증가하고 있음

265 X

26


• Dunbar’s Number
• 친밀한 관계의 한계는 150명~200명 사이라는 이롞

친구 수 많아 봐야
진짜 친구는 150명이야!~

※ Robin Dunbar, (2010), “How Many friends Does One Person Need?”.
27


• 친밀도 순으로 150명만 남겨놓고 나머지는 제거

28


• Result

Graph Graph
Relational
(All) (Top 150)
Response
5.9 sec 0.15 sec 0.0005 sec
Time 40X 300X

29


• Result

All Top 150
Result
9,060,712 14,912
Count 99%

Memory 900 MB 608 MB
32%
Load
37.1 sec 28.8 sec
Time 23%

30

친구추천 시스템의 확장성 문제

• Social Network의 급격한 성장

Batch Server
Batch Server Batch Server

아흑!
1대의 메모리를
넘어가면 어쩌지?

31


• 1안
• 메모리를 넘어서는 데이터는 SSD에 저장
• 캐시 알고리즘(LRU, LFU)에 따라 메모리에 데이터를 유지

Batch Server Batch Server

SSD

32


• 2안
• 그래프를 분류 알고리즘으로 분산해서 저장
• Local 메모리에서 miss시에 Remote Cache Cloud를 참조함

Batch Server 1 Batch Server 2 Batch Server 3 Batch Server 4

Remote Cache Cloud

33

Next Contents


•결론 및 향후 작업

34

결론 및 향후작업

• 친구추천 시스템 구현을 위해 다음의 방안들을 적용함
• Graph Model의 도입
• Pruning Edges using Dunbar’s Number
• Scalable Distributed Architecture
• 오픈 후 싞규 친구관계의 약 20%가 친구추천으로 맺어지고 있음
• 추천 알고리즘 정교화와 Scalability 확보는 계속 진행 중임

35

참고문헌

• Jilin Chen, (2009), “Recommending People on Social Networking Sites”.
• Robin Dunbar, (2010), “How Many friends Does One Person Need”.
• RENZO ANGLES, (2008), “Survey of Graph database models”, ACM
Computing Surveys.
• Marko A. Rodriguez, (2010), “Graph Traversal Programming Pattern”.
• HANNEMAN, R. A, (2001), “Introduction to social network methods”.
• TinkerPop, “TinkerGraph”.
http://github.com/tinkerpop/gremlin/wiki/tinkergraph
• Grzegorz Malewicz, (2010), “Pregel : A System for Large-Scale Graph
Processing”, Google, Inc.
• Microsoft, “Trinity”, http://research.microsoft.com/en-us/projects/trinity/

37

참고문헌

• Neo Technology, “Neo4j : the Graph database”. www.neo4j.org
• Twitter, “FlockDB”,
https://github.com/twitter/flockdb/blob/master/doc/blog.md
• Orient Technologies, “OrientDB”, http://www.orientechnologies.com/
• Marko A. Rodriguez, “MySQL vs. Neo4j on a Large-Scale Graph Traversal”,
http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-
graph-traversal/

38

SDEC2011 Implementing me2day friend suggestion

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Korea Sdec

Mais de Korea Sdec (16)

SDEC2011 Implementing me2day friend suggestion