SlideShare uma empresa Scribd logo
1 de 32
 
[object Object],[object Object],[object Object],[object Object],Who we are
Photobucket Solbase Activity Stream Agenda
•  Photobucket is the most-visited photo site with  23.4 Million UVs •  Over  9 Billion  photos stored! •  Users upload  4 Million  images per day! •  Photobucket users spend more time than any other photo site with  3.8 Avg mins/visit •  2.0 Million avg daily visitors  -  more daily visits than Flickr and Picasa combined  Sources: 1comScore May 2011, 2Internal data Photobucket Overview
23.4M UVs 9.9M UVs 9.5M UVs 7.9M UVs 1.6M UVs 19.7M UVs 6.0M UVs
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sources: 1comScore May 2011, 2Internal data Photobucket Stats
Solbase is an open-source, real-time search platform based on Lucene, Solr and HBase built at Photobucket What is Solbase?
[object Object],[object Object],[object Object],[object Object],Why Solbase?
[object Object],[object Object],[object Object],[object Object],Summary of what we did
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Results
[object Object],[object Object],Next Steps
https://github.com/Photobucket/Solbase https://github.com/Photobucket/Solbase-Solr https://github.com/Photobucket/Solbase-Lucene Solbase repos
Activity Stream is Social networking feature using HBase, Flume, Kestrel, Camel built at Photobucket What is Activity Stream?
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Activity Events
Activity Events Rendered
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Delivering Activities
[object Object],[object Object],[object Object],[object Object],Discussion Overview
Activity Collection
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Flume & Kestrel
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Fanout Processor & Camel
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Query Service
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Performance 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],What is HBase?
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Why HBase?
Hadoop/Hbase Architecture
Schema: {row key 1 {      column family 1{         c olumn 1 {data1},           column 2 {data 2}         … }      ...}   } {row key 2 {...}} Example: {dog:spotty {owner{matt{age 41}, linda{age 41}} vaccinations{rabies{july 2011}}} {cat:fluffy {owner{doug{age 41}, heather{age 41}} vaccinations{rabies{june2011}}} HBase Tables
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Our Schema Design
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hbase Client API
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Challenges
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],HBase Challenges
http://www.cloudera.com/resource/hadoop-world-2011-presentation-slides-advanced-hbase-schema-design http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/ References
Q&A

Mais conteúdo relacionado

Mais procurados

Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016Metosin Oy
 
Composer the right way
Composer the right wayComposer the right way
Composer the right wayRafael Dohms
 
Composer The Right Way #PHPjhb15
Composer The Right Way #PHPjhb15Composer The Right Way #PHPjhb15
Composer The Right Way #PHPjhb15Rafael Dohms
 
Composer the right way - DPC15
Composer the right way - DPC15Composer the right way - DPC15
Composer the right way - DPC15Rafael Dohms
 
Composer the Right Way - PHPBNL16
Composer the Right Way - PHPBNL16Composer the Right Way - PHPBNL16
Composer the Right Way - PHPBNL16Rafael Dohms
 
Composer the right way - NomadPHP
Composer the right way - NomadPHPComposer the right way - NomadPHP
Composer the right way - NomadPHPRafael Dohms
 
Composer The Right Way
Composer The Right WayComposer The Right Way
Composer The Right WayRafael Dohms
 
PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...
PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...
PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...Puppet
 
Composer The Right Way - 010PHP
Composer The Right Way - 010PHPComposer The Right Way - 010PHP
Composer The Right Way - 010PHPRafael Dohms
 
MongoDB revs you up: What Storage Engine is Right for You?
MongoDB revs you up: What Storage Engine is Right for You?MongoDB revs you up: What Storage Engine is Right for You?
MongoDB revs you up: What Storage Engine is Right for You?Jonathan E. Tobin
 
StormCrawler in the wild
StormCrawler in the wildStormCrawler in the wild
StormCrawler in the wildJulien Nioche
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekHakka Labs
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Rest 2.0 graph ql
Rest 2.0 graph qlRest 2.0 graph ql
Rest 2.0 graph qlNick Zheng
 
For each component in mule
For each component in muleFor each component in mule
For each component in muleRajkattamuri
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Codemotion
 
Composer the right way [SweetlakePHP]
Composer the right way [SweetlakePHP]Composer the right way [SweetlakePHP]
Composer the right way [SweetlakePHP]Rafael Dohms
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 

Mais procurados (20)

Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016
 
Composer the right way
Composer the right wayComposer the right way
Composer the right way
 
Composer The Right Way #PHPjhb15
Composer The Right Way #PHPjhb15Composer The Right Way #PHPjhb15
Composer The Right Way #PHPjhb15
 
Composer the right way - DPC15
Composer the right way - DPC15Composer the right way - DPC15
Composer the right way - DPC15
 
Composer the Right Way - PHPBNL16
Composer the Right Way - PHPBNL16Composer the Right Way - PHPBNL16
Composer the Right Way - PHPBNL16
 
Composer the right way - NomadPHP
Composer the right way - NomadPHPComposer the right way - NomadPHP
Composer the right way - NomadPHP
 
Composer The Right Way
Composer The Right WayComposer The Right Way
Composer The Right Way
 
PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...
PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...
PuppetConf 2017: Custom Types & Providers: Modeling Modern REST Interfaces an...
 
Composer The Right Way - 010PHP
Composer The Right Way - 010PHPComposer The Right Way - 010PHP
Composer The Right Way - 010PHP
 
MongoDB revs you up: What Storage Engine is Right for You?
MongoDB revs you up: What Storage Engine is Right for You?MongoDB revs you up: What Storage Engine is Right for You?
MongoDB revs you up: What Storage Engine is Right for You?
 
StormCrawler in the wild
StormCrawler in the wildStormCrawler in the wild
StormCrawler in the wild
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Rest 2.0 graph ql
Rest 2.0 graph qlRest 2.0 graph ql
Rest 2.0 graph ql
 
For each component in mule
For each component in muleFor each component in mule
For each component in mule
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
 
Composer the right way [SweetlakePHP]
Composer the right way [SweetlakePHP]Composer the right way [SweetlakePHP]
Composer the right way [SweetlakePHP]
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 

Destaque

2011_Replanning_Your Business
2011_Replanning_Your Business2011_Replanning_Your Business
2011_Replanning_Your Businessmguckin
 
PresentacióN Final
PresentacióN FinalPresentacióN Final
PresentacióN Finalgueste40a07
 
Spatial and Socioeconomic Fishing Profiles: Central California National Marin...
Spatial and Socioeconomic Fishing Profiles: Central California National Marin...Spatial and Socioeconomic Fishing Profiles: Central California National Marin...
Spatial and Socioeconomic Fishing Profiles: Central California National Marin...Ecotrust
 
God's Grace (shared using http://VisualBee.com).
God's Grace (shared using http://VisualBee.com).God's Grace (shared using http://VisualBee.com).
God's Grace (shared using http://VisualBee.com).VisualBee.com
 
T A L L E R D E V O C E R O S
T A L L E R  D E  V O C E R O ST A L L E R  D E  V O C E R O S
T A L L E R D E V O C E R O SSanmil
 
Tugas 3 rekayasa web
Tugas 3 rekayasa webTugas 3 rekayasa web
Tugas 3 rekayasa webmuslim rohadi
 
A Socioeconomic Baseline Assessment of the Pribilof Islands
A Socioeconomic Baseline Assessment of the Pribilof IslandsA Socioeconomic Baseline Assessment of the Pribilof Islands
A Socioeconomic Baseline Assessment of the Pribilof IslandsEcotrust
 
Vlaggenlijn maken
Vlaggenlijn makenVlaggenlijn maken
Vlaggenlijn makenTrias ngo
 
Cómo se hace un guión de cine
Cómo se hace un guión de cineCómo se hace un guión de cine
Cómo se hace un guión de cineJuan Sandoval Nava
 
Cordell Construction Market Movement Report
Cordell Construction Market Movement ReportCordell Construction Market Movement Report
Cordell Construction Market Movement ReportCoreLogic
 
個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例
個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例
個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例Yusaku Kinoshita
 
Assessment of the newborn
Assessment of the newbornAssessment of the newborn
Assessment of the newbornAde Pratiwi
 
Confidentiality Awareness
Confidentiality AwarenessConfidentiality Awareness
Confidentiality Awarenessitchomecare
 
Trias - yearly report 2014
Trias - yearly report 2014Trias - yearly report 2014
Trias - yearly report 2014Trias ngo
 
2009 07 25 Authority
2009 07 25 Authority2009 07 25 Authority
2009 07 25 Authorityolopya
 
Socioeconomic considerations in marine resource management
Socioeconomic considerations in marine resource management Socioeconomic considerations in marine resource management
Socioeconomic considerations in marine resource management Ecotrust
 

Destaque (20)

2011_Replanning_Your Business
2011_Replanning_Your Business2011_Replanning_Your Business
2011_Replanning_Your Business
 
PresentacióN Final
PresentacióN FinalPresentacióN Final
PresentacióN Final
 
Spatial and Socioeconomic Fishing Profiles: Central California National Marin...
Spatial and Socioeconomic Fishing Profiles: Central California National Marin...Spatial and Socioeconomic Fishing Profiles: Central California National Marin...
Spatial and Socioeconomic Fishing Profiles: Central California National Marin...
 
God's Grace (shared using http://VisualBee.com).
God's Grace (shared using http://VisualBee.com).God's Grace (shared using http://VisualBee.com).
God's Grace (shared using http://VisualBee.com).
 
Tugas rekayasa web
Tugas rekayasa webTugas rekayasa web
Tugas rekayasa web
 
Denah pp
Denah ppDenah pp
Denah pp
 
T A L L E R D E V O C E R O S
T A L L E R  D E  V O C E R O ST A L L E R  D E  V O C E R O S
T A L L E R D E V O C E R O S
 
Tugas 3 rekayasa web
Tugas 3 rekayasa webTugas 3 rekayasa web
Tugas 3 rekayasa web
 
Nfl
NflNfl
Nfl
 
A Socioeconomic Baseline Assessment of the Pribilof Islands
A Socioeconomic Baseline Assessment of the Pribilof IslandsA Socioeconomic Baseline Assessment of the Pribilof Islands
A Socioeconomic Baseline Assessment of the Pribilof Islands
 
Vlaggenlijn maken
Vlaggenlijn makenVlaggenlijn maken
Vlaggenlijn maken
 
Cómo se hace un guión de cine
Cómo se hace un guión de cineCómo se hace un guión de cine
Cómo se hace un guión de cine
 
Cordell Construction Market Movement Report
Cordell Construction Market Movement ReportCordell Construction Market Movement Report
Cordell Construction Market Movement Report
 
個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例
個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例
個人開発アプリのご紹介とあり得ない不具合に対する掲示板の活用事例
 
Description of inergen system
Description of inergen systemDescription of inergen system
Description of inergen system
 
Assessment of the newborn
Assessment of the newbornAssessment of the newborn
Assessment of the newborn
 
Confidentiality Awareness
Confidentiality AwarenessConfidentiality Awareness
Confidentiality Awareness
 
Trias - yearly report 2014
Trias - yearly report 2014Trias - yearly report 2014
Trias - yearly report 2014
 
2009 07 25 Authority
2009 07 25 Authority2009 07 25 Authority
2009 07 25 Authority
 
Socioeconomic considerations in marine resource management
Socioeconomic considerations in marine resource management Socioeconomic considerations in marine resource management
Socioeconomic considerations in marine resource management
 

Semelhante a Solbase & Real-time Activity

Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Building Event-Based Systems for the Real-Time Web
Building Event-Based Systems for the Real-Time WebBuilding Event-Based Systems for the Real-Time Web
Building Event-Based Systems for the Real-Time Webpauldix
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5Samuel Rash
 
Comet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardComet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardNOLOH LLC.
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Remy Rosenbaum
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loopJosh Baer
 
Open Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For ThemselvesOpen Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For Themselvesloriayre
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...
How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...
How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...VMware Tanzu
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsNETWAYS
 
Facebook[The Nuts and Bolts Technology]
Facebook[The Nuts and Bolts Technology]Facebook[The Nuts and Bolts Technology]
Facebook[The Nuts and Bolts Technology]Koushik Reddy
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 

Semelhante a Solbase & Real-time Activity (20)

Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Building Event-Based Systems for the Real-Time Web
Building Event-Based Systems for the Real-Time WebBuilding Event-Based Systems for the Real-Time Web
Building Event-Based Systems for the Real-Time Web
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5
 
Comet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forwardComet: by pushing server data, we push the web forward
Comet: by pushing server data, we push the web forward
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loop
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Open Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For ThemselvesOpen Source Library System Software: Libraries Are Doing it For Themselves
Open Source Library System Software: Libraries Are Doing it For Themselves
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...
How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...
How to Build a High Performance Application Using Cloud Foundry and Redis (Cl...
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph Luchs
 
Facebook[The Nuts and Bolts Technology]
Facebook[The Nuts and Bolts Technology]Facebook[The Nuts and Bolts Technology]
Facebook[The Nuts and Bolts Technology]
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
DataHub
DataHubDataHub
DataHub
 

Último

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Solbase & Real-time Activity

Notas do Editor

  1. We should go over these agendas and introduce each of presenters
  2. First, Koh is going to talk about Solbase.  That's our real time search engine that was built on top of Lucene, Solr, and HBase.  We started presenting Solbase about 9 months ago, and at that time we reported that our standard implementation of lucene/solr was no longer scaling to meet our needs, and our initial tests of Solbase gave us hope that we were going to solve that problem AND dramatically improve performance.  In addition we were updating our search index in real time.  Great results, but possibly the bigger news at that time was that we were planning to open source all the code.  Tonight Koh is here to deliver on that promise. The next topic we'll cover is another HBase feature developed at PB: our activity stream.  It's what you'd probably expect.  A social network feature that distributes events about photos and videos in near real time.  We've seen a number of presentations on similar features, but rarely to you see any detail on the architecture or lessons learned that would help you build your own.  Ron and Josh are going to do exactly that. But before we jump into all that... why do you care?  who is PB?  
  3. We're the biggest dedicated photo site on the web and we're right next door.   We have millions of active users and billions of photos.
  4. Here's a quick slide on our size compared to our peers… its a little old, but you get the idea.   We have millions of unique visitors.  
  5. Over time those users have contributed half a billion public photos and videos to our search index, and we generate a boatload of social events around all that public media.
  6. Lucene's Field cache for sorting and filtering became very problematic for us Turn around time for building entire set of indices took us about a day Every 100 ms improvement in response time equates to approximatey 1 extra page views Impractical to add significatn number of new docs and data 
  7. In a nutch shell, Solbase have basically replaced indices stored in local filesystem to database in HBase also overcame lucene's inherent limitations. and one major one we solved is sort/filter 
  8. Ron Here
  9. Ron Here
  10. Ron Here
  11. Ron Here
  12. Kestrel is open source and developed at twitter.
  13. Talk about scale and real-time processing speed. Ops per second. 1 thread push 40/s all the way to hbase.
  14. Talk about scale and real-time processing speed. Ops per second. 1 thread push 40/s all the way to hbase.
  15. Josh Here HBase is a distributed big-table like database build upon Hadoop components leverages HDFS, Hadoop ’s distributed file system Built upon Hadoop, scales to a massive size, virtually limitless used by many large scale companies: Facebook, Yahoo, Google (through their big-table implementaiton) Ask who has used hbase
  16. Josh Here HBase is a distributed big-table like database build upon Hadoop components leverages HDFS, Hadoop ’s distributed file system Built upon Hadoop, scales to a massive size, virtually limitless used by many large scale companies: Facebook, Yahoo, Google (through their big-table implementaiton) Ask who has used hbase To fix: 1. Features     column store     key/value store witih semi-structured values.      2. Why use hbase?     -horizontal scalability     -high write throughput     -millions of columns billion of rows
  17. consists of master nodes with a set of region servers to distribute the data The master is the gateway interface to direct clients to the proper region server for the requested data Data is replicated among several data nodes by Hadoop ’s file system, HDFS There is ‘locational affinity’ between the region server and the data served
  18. Each table consists of a row key, a set of defined column families, and an arbitrary number of qualified columns for each family Keys are store lexicographically so that range scans between two keys is extremely fast All data is binary interestingly, this is similar to the concept of the inverted index, where the ‘terms’ are lexicographically stored; this is something that we leverage in our implementation
  19. Mention using lexicographical key to pre-sort data.
  20. Get : single row access, similar to SQL like query by primary key Put: single row update/insert (can be done in batch) Scan: lexicographic range query between 2 specified keys
  21. Back to Ron HBase optimization: scans continue to be fast, large multi-gets have been an issue.
  22. HBase optimization: scans continue to be fast, large multi-gets have been an issue.