SlideShare uma empresa Scribd logo
1 de 45
CUBRID Reference Architecture for Social Networking Service Kieun Park NHN Business Platform Corp. 2011.8
46  CUBRID Reference Architecture for Social Networking Service 2 /
Abstract 46  CUBRID Reference Architecture for Social Networking Service 3 / The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts heavily impact on database they have. Thus, best practice in database architecture is important. Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs have a large number of followers so that the social graph is very huge. These properties of the data lead to new challenges, and  demands new database architecture to handle them. The main considerations of database architecture for OSN are about scale-out and performance in addition to high availability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the service. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues. In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architectures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea. Described are the helpful features to support the database architecture demands for OSN service. For example, index scan with top-k sorting technique  is developed for fast feed aggregation. Also, HA, automatic sharding and clustering features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
I Am 46  CUBRID Reference Architecture for Social Networking Service 4 / 박기은Kieun Park ,[object Object]
Service Platform Development Center
NHN Business Platform Corp.
iamyaw@nhn.com
CUBRID Open Source DBMS
nStore Distributed Database System,[object Object]
Contents 46  CUBRID Reference Architecture for Social Networking Service 6 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service Business demands and system requirements Main considerations of database architecture for OSN service Scale-out, performance, and high availability
Contents 46  CUBRID Reference Architecture for Social Networking Service 7 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID unique features CUBRID reference architecture for social networking service Index scan with top-k sorting technique High availability feature Automatic sharding component CUBRID Cluster System nStore, a distributed database system based on the CUBRID
Contents 46  CUBRID Reference Architecture for Social Networking Service 8 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service CUBRID Web Reference Architecture CUBRID SNS Reference Architecture
46  CUBRID Reference Architecture for Social Networking Service 9 / Characteristics of online social networking service
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 10 / The history and evolution of OSN are made in last 10 years. Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 11 / 500 million Facebook users, 106 million Twitter users Social networks with user bases larger than the population of most countries Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 12 / The top ranked twitter user, Lady Gaga, has 11 million followers. About 55 million Tweets per day. Twitter gets about 600 million queries every day. (http://twitaholic.com) Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 13 / The most followed person, Eminem, has more than 44 million fans. More than 5 billion pieces of content shared each week. 2,716,000 messages, 1,587,000 wall posts, 10,208,000 comments in 20 minutes on Facebook. (http://www.independent.co.uk) Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/ Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
Some Infographics about Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 14 / Have we reached a world of infinite information? In a similar manner to our universe, the Internet is expanding at an incredibly rapid pace, reaching new levels of information storage and content creation every second. By 2020, roughly 25x1018 (quintillion) information containers Every minute, 24 hours of video The growth gap between the digital contents created and the available storage Sourcehttp://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
Statistics of Facebook and Twitter 46  CUBRID Reference Architecture for Social Networking Service 15 / 140 million; the average number of Tweets people sent per day. 6,939;current TPSrecord. More than 750 million active users. There are over 900 million objects that people interact with (pages, groups, events and community pages) Source http://www.facebook.com/press/info.php?statistics Source http://blog.twitter.com/2011/03/numbers.html
Statistics of Me2Day 46  CUBRID Reference Architecture for Social Networking Service 16 / Postings per day: 278,461 Total postings: 123,456,727 Total photos: 10,638,089
Online social networking service 46  CUBRID Reference Architecture for Social Networking Service 17 / Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a aggregation of events connected by social graph which is changing constantly with users' realtimeinteraction.
Feed Following Works 46  CUBRID Reference Architecture for Social Networking Service 18 / Feeds Following Contents (comment, photo, tag,  …) Follower News Feeds (personalized feeds) Application Layer Outbox Inbox Delivery & Aggregation Engine Content Management Layer Cache Database Database Data Storage Layer
Characteristics of Online Social Networking Service 46  CUBRID Reference Architecture for Social Networking Service 19 / Power-law scalinggrowth ,[object Object]
Followers gets personalized feeds that aggregate streams produced those followed.
Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.Online social networks have properties of significant clustering, small diameter, and power-law degrees. Zipfiandistribution access Data feeding frenzy Twitter Activity 5% of users account for 75% of all activity, 10% account for 86% of activity, and the top 30% account for 97.4%.
46  CUBRID Reference Architecture for Social Networking Service 20 / Challenges and demands on database architecture
Challenge and Demands on Database Architecture 46  CUBRID Reference Architecture for Social Networking Service 21 / From business demands to technology implementation. ,[object Object]
Today social media generates more information in a short period of time than was previously available in the entire world a few generations ago.
Not only the exponential growth of Facebook, Google+, Twitter, but also the use of more and more rich media such as user-generated video from smart phone, is surely driving big data.Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
Social media now produces massive amounts of data. Facebook’s network, for instance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes] With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage.  When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens. Challenge and Demands on Database Architecture 46  CUBRID Reference Architecture for Social Networking Service 22 / Managing user generated socialinteraction data! Coping with explosion in data volume! Cost-effective scale-out to meet rapidly growing demands!
46  CUBRID Reference Architecture for Social Networking Service 23 / CUBRID unique features
CUBRID 46  CUBRID Reference Architecture for Social Networking Service 24 / Free open source is the choice of the modern world Powerful clean architecture with rich functionality for competitive performance Enterprise unique features for stability and reliability
[object Object]
Reclaim deleted space
Fast serial data (cached)
LFS (large file support ) for database volumeCUBRID 46  CUBRID Reference Architecture for Social Networking Service 25 / CUBRID 4.0 stable released. July, 2011 CUBRID 3.0 stable released. October, 2010 Official open source community, www.cubrid.org, opened. ,[object Object]
Database volume size reduced.
Multi-range scan and key limit function
Covered indexOctober, 2009 CUBRID Cluster Project has been started. September, 2009 CUBRID 2008 R2.0 stable released. August, 2009 ,[object Object]
HA monitoring
Full SQL function supportCUBRID became an open source project. CUBRID 2008 R1.1 stable was released. November, 2008 First internal release CUBRID 2008 R1.0 October, 2008 The development of CUBRID DBMS started. 2011 2006  2007  2008  2009  2010  2012
CUBRID Index Scan with Top-k Sorting Technique 46  CUBRID Reference Architecture for Social Networking Service 26 / CUBRID does multi-range index scan. My friends’ newest twenty comments SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000 ORDER BY registered_date DESC LIMIT 20 Multi-range scan Single range scan with key filter Disk I/O ?! # of leaf pages accessed > # of keys of scan result # of leaf pages accessed  = # of keys of scan result Filter out On the fly sorting during scan Sort after scan (4,10001) (4,9999) (4,875) … (4,10001) (4,9999) (4,875) … (36,947) (36,120) (36,3) … (36,947) (36,120) (36,3) … (15, 10000) (15,9999) (15, 7467) … (15, 10000) (15,9999) (15, 7467) …
CUBRID Index Scan with Top-k Sorting Technique 46  CUBRID Reference Architecture for Social Networking Service 27 / SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’ ORDER BY b LIMIT 3; SELECT * FROM tbl WHERE a = 2 AND b < ‘K’ ORDER BY b LIMIT 3;
CUBRID Test Results 46  CUBRID Reference Architecture for Social Networking Service 28 / Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test Test case 1: user group 1 only Test case 2: user group 2 only Test case 3: 40% of user group 1, 50% of user group 2, 10% of user group 3 Test case 4: 10% of user group 1, 50% of user group 2, 40% of user group 3 User group 1: users with 50 or less friends User group 2: users with 51~2000 friends User group 3: users with friends up to tens of thousands
CUBRID High Availability Feature 46  CUBRID Reference Architecture for Social Networking Service 29 / CUBRID HA, highly fault-resistant DBMS enables ,[object Object]

Mais conteúdo relacionado

Mais de CUBRID

The Architecture of CUBRID
The Architecture of CUBRIDThe Architecture of CUBRID
The Architecture of CUBRIDCUBRID
 
Installing CUBRID on Windows
Installing CUBRID on WindowsInstalling CUBRID on Windows
Installing CUBRID on WindowsCUBRID
 
Installing CUBRID on Linux
Installing CUBRID on LinuxInstalling CUBRID on Linux
Installing CUBRID on LinuxCUBRID
 
Cubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCUBRID
 
Cubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCUBRID
 
Cubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCUBRID
 

Mais de CUBRID (6)

The Architecture of CUBRID
The Architecture of CUBRIDThe Architecture of CUBRID
The Architecture of CUBRID
 
Installing CUBRID on Windows
Installing CUBRID on WindowsInstalling CUBRID on Windows
Installing CUBRID on Windows
 
Installing CUBRID on Linux
Installing CUBRID on LinuxInstalling CUBRID on Linux
Installing CUBRID on Linux
 
Cubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 ReplicationCubrid Inside 5th Session 4 Replication
Cubrid Inside 5th Session 4 Replication
 
Cubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 MigrationCubrid Inside 5th Session 3 Migration
Cubrid Inside 5th Session 3 Migration
 
Cubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha ImplementationCubrid Inside 5th Session 2 Ha Implementation
Cubrid Inside 5th Session 2 Ha Implementation
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

CUBRID Features Optimized for Social Networking Services

  • 1. CUBRID Reference Architecture for Social Networking Service Kieun Park NHN Business Platform Corp. 2011.8
  • 2. 46 CUBRID Reference Architecture for Social Networking Service 2 /
  • 3. Abstract 46 CUBRID Reference Architecture for Social Networking Service 3 / The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts heavily impact on database they have. Thus, best practice in database architecture is important. Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs have a large number of followers so that the social graph is very huge. These properties of the data lead to new challenges, and demands new database architecture to handle them. The main considerations of database architecture for OSN are about scale-out and performance in addition to high availability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the service. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues. In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architectures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea. Described are the helpful features to support the database architecture demands for OSN service. For example, index scan with top-k sorting technique is developed for fast feed aggregation. Also, HA, automatic sharding and clustering features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
  • 4.
  • 9.
  • 10. Contents 46 CUBRID Reference Architecture for Social Networking Service 6 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service Business demands and system requirements Main considerations of database architecture for OSN service Scale-out, performance, and high availability
  • 11. Contents 46 CUBRID Reference Architecture for Social Networking Service 7 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID unique features CUBRID reference architecture for social networking service Index scan with top-k sorting technique High availability feature Automatic sharding component CUBRID Cluster System nStore, a distributed database system based on the CUBRID
  • 12. Contents 46 CUBRID Reference Architecture for Social Networking Service 8 / Characteristics of online social networking service Challenges and demands on database architecture CUBRID features CUBRID reference architecture for social networking service CUBRID Web Reference Architecture CUBRID SNS Reference Architecture
  • 13. 46 CUBRID Reference Architecture for Social Networking Service 9 / Characteristics of online social networking service
  • 14. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 10 / The history and evolution of OSN are made in last 10 years. Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
  • 15. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 11 / 500 million Facebook users, 106 million Twitter users Social networks with user bases larger than the population of most countries Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
  • 16. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 12 / The top ranked twitter user, Lady Gaga, has 11 million followers. About 55 million Tweets per day. Twitter gets about 600 million queries every day. (http://twitaholic.com) Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
  • 17. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 13 / The most followed person, Eminem, has more than 44 million fans. More than 5 billion pieces of content shared each week. 2,716,000 messages, 1,587,000 wall posts, 10,208,000 comments in 20 minutes on Facebook. (http://www.independent.co.uk) Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/ Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
  • 18. Some Infographics about Online Social Networking Service 46 CUBRID Reference Architecture for Social Networking Service 14 / Have we reached a world of infinite information? In a similar manner to our universe, the Internet is expanding at an incredibly rapid pace, reaching new levels of information storage and content creation every second. By 2020, roughly 25x1018 (quintillion) information containers Every minute, 24 hours of video The growth gap between the digital contents created and the available storage Sourcehttp://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
  • 19. Statistics of Facebook and Twitter 46 CUBRID Reference Architecture for Social Networking Service 15 / 140 million; the average number of Tweets people sent per day. 6,939;current TPSrecord. More than 750 million active users. There are over 900 million objects that people interact with (pages, groups, events and community pages) Source http://www.facebook.com/press/info.php?statistics Source http://blog.twitter.com/2011/03/numbers.html
  • 20. Statistics of Me2Day 46 CUBRID Reference Architecture for Social Networking Service 16 / Postings per day: 278,461 Total postings: 123,456,727 Total photos: 10,638,089
  • 21. Online social networking service 46 CUBRID Reference Architecture for Social Networking Service 17 / Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a aggregation of events connected by social graph which is changing constantly with users' realtimeinteraction.
  • 22. Feed Following Works 46 CUBRID Reference Architecture for Social Networking Service 18 / Feeds Following Contents (comment, photo, tag, …) Follower News Feeds (personalized feeds) Application Layer Outbox Inbox Delivery & Aggregation Engine Content Management Layer Cache Database Database Data Storage Layer
  • 23.
  • 24. Followers gets personalized feeds that aggregate streams produced those followed.
  • 25. Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.Online social networks have properties of significant clustering, small diameter, and power-law degrees. Zipfiandistribution access Data feeding frenzy Twitter Activity 5% of users account for 75% of all activity, 10% account for 86% of activity, and the top 30% account for 97.4%.
  • 26. 46 CUBRID Reference Architecture for Social Networking Service 20 / Challenges and demands on database architecture
  • 27.
  • 28. Today social media generates more information in a short period of time than was previously available in the entire world a few generations ago.
  • 29. Not only the exponential growth of Facebook, Google+, Twitter, but also the use of more and more rich media such as user-generated video from smart phone, is surely driving big data.Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
  • 30. Social media now produces massive amounts of data. Facebook’s network, for instance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes] With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage. When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens. Challenge and Demands on Database Architecture 46 CUBRID Reference Architecture for Social Networking Service 22 / Managing user generated socialinteraction data! Coping with explosion in data volume! Cost-effective scale-out to meet rapidly growing demands!
  • 31. 46 CUBRID Reference Architecture for Social Networking Service 23 / CUBRID unique features
  • 32. CUBRID 46 CUBRID Reference Architecture for Social Networking Service 24 / Free open source is the choice of the modern world Powerful clean architecture with rich functionality for competitive performance Enterprise unique features for stability and reliability
  • 33.
  • 35. Fast serial data (cached)
  • 36.
  • 38. Multi-range scan and key limit function
  • 39.
  • 41. Full SQL function supportCUBRID became an open source project. CUBRID 2008 R1.1 stable was released. November, 2008 First internal release CUBRID 2008 R1.0 October, 2008 The development of CUBRID DBMS started. 2011 2006  2007  2008  2009  2010  2012
  • 42. CUBRID Index Scan with Top-k Sorting Technique 46 CUBRID Reference Architecture for Social Networking Service 26 / CUBRID does multi-range index scan. My friends’ newest twenty comments SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000 ORDER BY registered_date DESC LIMIT 20 Multi-range scan Single range scan with key filter Disk I/O ?! # of leaf pages accessed > # of keys of scan result # of leaf pages accessed = # of keys of scan result Filter out On the fly sorting during scan Sort after scan (4,10001) (4,9999) (4,875) … (4,10001) (4,9999) (4,875) … (36,947) (36,120) (36,3) … (36,947) (36,120) (36,3) … (15, 10000) (15,9999) (15, 7467) … (15, 10000) (15,9999) (15, 7467) …
  • 43. CUBRID Index Scan with Top-k Sorting Technique 46 CUBRID Reference Architecture for Social Networking Service 27 / SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’ ORDER BY b LIMIT 3; SELECT * FROM tbl WHERE a = 2 AND b < ‘K’ ORDER BY b LIMIT 3;
  • 44. CUBRID Test Results 46 CUBRID Reference Architecture for Social Networking Service 28 / Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test Test case 1: user group 1 only Test case 2: user group 2 only Test case 3: 40% of user group 1, 50% of user group 2, 10% of user group 3 Test case 4: 10% of user group 1, 50% of user group 2, 40% of user group 3 User group 1: users with 50 or less friends User group 2: users with 51~2000 friends User group 3: users with friends up to tens of thousands
  • 45.
  • 48. Various acess modes (read-write, read-only)Application CUBRID Driver CUBRID Driver UPDATE SELECT UPDATE Broker Active Broker Backup Broker automatic switch-over Read-Only Mode Read-Write Mode Standby-2 Server @Remote IDC Standby-1 Server automatic fail-over/fail-back Active Server Database Server Slave DB Master DB Slave DB
  • 49. CUBRID High Availability Feature 46 CUBRID Reference Architecture for Social Networking Service 30 / UPDATE SELECT Heartbeat Heartbeat Log Applying Log Applying Log Shipping (synchronous) Log Writer Log Applier CUBRID Server Log Writer Log Applier CUBRID Server Slave DB Replication Log Replication Log Transaction Log Transaction Log Master DB S1-Node Standby Server Node A-Node Active Server Node Log Shipping (asynchronous) Heartbeat SELECT Log Applying HA feature is based on database replication with transaction log multiplication technique. Slave DB Replication Log Transaction Log Statement-based replication could cause data inconsistency. S2-Node
  • 50.
  • 51.
  • 52.
  • 54.
  • 56. Additionally, linear scalabilityApplication SELECT * FROM gtable WHERE part_key=2 AND … INSERT INTO gtable … Broker load balancing global schema / distributed partition gtable part_01 part_05 gtable part_02 part_06 gtable part_03 part_07 gtable part_04 part_08 Node #1 Node #2 Node #3 Node #4 Cluster Server
  • 57. CUBRID Cluster System 46 CUBRID Reference Architecture for Social Networking Service 33 / The global schemais a single representation or a global view of all nodes where each node has its own database and schema. SELECT * FROM contents WHERE auth = (SELECT name FROM author WHERE …) Local Schema User Global Schema User UPDATE local … SELECT * FROM contents WHERE … SELECT * FROM info, code WHERE info.id = code.id INSERT INTO contents… info contents author Global Schema author code level local contents contents contents info Local Schema #4 Local Schema #3 Local Schema #2 Local Schema #1 The users can access any databases through a single schema regardless of and without knowing the location of the distributed data. Database #1 Database #2 Database #3 Database #4
  • 58. CUBRID Cluster 46 CUBRID Reference Architecture for Social Networking Service 34 / Global Schema Data System Catalog Logical View Logical View Index Physical View Physical View Schema Schema Data System Catalog System Catalog Data Index Index
  • 59. CUBRID Cluster 46 CUBRID Reference Architecture for Social Networking Service 35 / The distributed partition maps global schema onto table partitioning. Partitions are resident in different nodes but accessed through global schema. SELECT * FROM gtable, info WHERE gtable.part_key=02 AND info.id = gtable.id gtable – PARTITION BY HASH (part_key) info part_01 part_02 part_03 part_04 Global Schema part_05 part_06 part_07 part_08 Partition Data Partition Data Partition Data Partition Data part_02 part_03 part_03 part_01 info part_06 part_07 part_08 part_05 Database #1 Database #2 Database #3 Database #4
  • 60.
  • 61.
  • 62.
  • 63. nStore, a distributed database system based on the CUBRID 46 CUBRID Reference Architecture for Social Networking Service 38 / Application Container Server Container (ckey=iamyaw) nStore Equi-join REST API Table A Table B Container Server Table C Indexed Column Equi-join Container Server Container Server Global Table G Management Node Indexed Column Container (ckey=kieun_park) Equi-join Container Server Table A Table B Tables Table C Indexed Column Distribution layer RDBMS Indexed Column
  • 64. nStore Test Results 46 CUBRID Reference Architecture for Social Networking Service 39 / Tested using YCSB (http://research.yahoo.com/Web_Information_Management/YCSB) INSERT: 50,000,000 records (1K size) READ: Zifian distribution READ w/ compaction: after SSTable compaction (Cassandra, Hbase) READ/UPDATE: 50:50 (50,000,000 records DB) READ/INSERT: 50:50 (50,000,000 records DB)
  • 65. 46 CUBRID Reference Architecture for Social Networking Service 40 / CUBRID referencearchitecture for social networking service
  • 66. CUBRID Web Reference Architecture 46 CUBRID Reference Architecture for Social Networking Service 41 / Mid-size web service Web Server (User Interface) Small-size web service Web Application Server (Business Logic) Cache Server Web Server RW RO DB Sharding master master master master CUNITOR master slave slave slave slave slave CUBRID HA CUBRID HA
  • 67. Social Networking Service Architecture 46 CUBRID Reference Architecture for Social Networking Service 42 / Web Servers (User Interface) Cache Layer Web Application Servers (Business Logic) Social Query Engine Aggregation Engine Delivery Engine Search Engine Recommendation Engine User Profile DB Social Relation DB Analytics DB Feed Outbox DB Feed Inbox DB Search Index
  • 68. CUBRID SNS Reference Architecture 46 CUBRID Reference Architecture for Social Networking Service 43 / Analytic DB partitioned for OLAP Application servers ETL Cache server farm node #2 node #n node #1 CUBRID Cluster User profile DB sharded by user-id Social relation DB sharded by user-id Inbox/Outbox storage distributed according to user-id OAM RW RO RW RO broker broker DB Sharding container container DB Sharding container container management slave slave slave slave monitoring server container container nStore w/ CUBRID CUNITOR master master master master CUBRID HA CUBRID HA
  • 69. Best Practices 46 CUBRID Reference Architecture for Social Networking Service 44 / High available database architecture is the basic business requirements and not technical barrier anymore. Automatic shardingis an effective way to scale-out DB system storing relational model data. nStore is a solution for peta-byte scale data with benefits of high available and scalable distributed store.