SlideShare uma empresa Scribd logo
1 de 38
Real Time Analytics for Big Data
A Twitter Case Study

                                        Uri Cohen
                                   Head of Product
                                        @uri1803
Big Data Predictions



         “Over the next few years we'll see the adoption of scalable
         frameworks and platforms for handling
         streaming, or near real-time, analysis and processing. In the
         same way that Hadoop has been borne out of large-scale web
         applications, these platforms will be driven by the needs of large-
         scale location-aware mobile, social and sensor use.”
         Edd Dumbill, O’REILLY




          ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
                                                                     2
We’re Living in a Real Time World…
 Facebook Real Time                      Real Time User                        Google RT Web Analytics
 Social Analytics                        Engagement




Twitter paid tweet analytics                New Real Time                      Google Real Time Search
                                            Analytics Startups




 3                      ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
3 Flavors to Analytics




    Counting                                Correlating               Research




4              ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Counting

 How many
  signups, tweets, retweet
  s for a topic?
 What’s the average
  latency?
 Demographics
       Countries and cities
       Gender
       Age groups
       Device types
       …



5                  ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Correlating

 What devices fail at the
  same time?
 What features get user
  hooked?
 What places on the
  globe are “happening”?




6             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Research

 Sentiment analysis
     “Obama is popular”
 Trends
     “People like to tweet
      after watching
      American Idol”
 Spam patterns
     How can you tell when
      a user spams?




7                ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s All about Timing




      “Real time”                      Reasonably Quick                    Batch
    (< few Seconds)                   (seconds - minutes)               (hours/days)




8                ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s All about Timing

              • Event driven / stream processing
              • High resolution – every tweet gets counted



              • Ad-hoc querying          This is what
              • Medium resolution (aggregations) here
                                         we’re
                                                                 to discuss 

              • Long running batch jobs (ETL, map/reduce)
              • Low resolution (trends & patterns)

9         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



     It takes a week for users to
     send   1 billion Tweets.
                                                      Source: http://blog.twitter.com/2011/03/numbers.html

10          ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)


                  On average,
        140 million
     tweets get sent every day.
                                                    Source: http://blog.twitter.com/2011/03/numbers.html

11        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)


         The highest
     throughput to date is
6,939 tweets/sec.
                                                   Source: http://blog.twitter.com/2011/03/numbers.html

12       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



     460,000 new
       accounts
        are created daily.
                                                   Source: http://blog.twitter.com/2011/03/numbers.html

13       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers




     5% of the users generate
      75% of the content.
                                                            Source: http://www.sysomos.com/insidetwitter/

14        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge 1 – Computing Reach




15       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge 1 – Computing Reach
       Tweets                 Followers                     Distinct
                                                            Followers




                                                                        Count
                                                                                Reach




16         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge 2 – Word Count
       Tweets




                                                                      Count
                                                                               Word:Count




                                                                • Hottest topics
                                                                • URL mentions
                                                                • etc.
17       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
URL Mentions – Here’s One Use Case




18       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analyze the Problem

 (Tens of) thousands of tweets per second to
  process
      Assumption: Need to process in near real time
 Aggregate counters for each word
      A few 10s of thousands of words (or hundreds of
       thousands if we include URLs)
 System needs to linearly scale


19            ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s Really Just an Online Map/Reduce




20        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Event Driven Flow

               Store
                                                                  Research
               Raw


           Tokenize                                      Filter    Count


           Store
          Counters

21       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s Not That Simple…
 (Tens) of thousands of tweets to tokenize every
  second, even more words to filter 
  CPU bottleneck
 Tens/hundreds of thousands of counters to
  update  Counters contention
 Tens/hundreds of thousands of counters to
  persist  Database bottleneck
 (Tens) of thousands of tweets to store every
  second in the database  Database bottleneck

22         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Implementation Challenges

 Twitter is growing by the day,
  how can this scale?
 Fault tolerance.
  What happens if a server fails?
 Consistency.
  Counters should be accurate!



23         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Solutions, Solutions




24   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Dealing with the Massive Tweet Stream
 We need a system that could scale linearly
      Event partitioning to the rescue!
      What to partition by?
                                                               Tokenizer1   Filterer 1

                                                               Tokenizer2   Filterer 2

                                                                Tokenizer
                                                                            Filterer 3
                                                                    3



                                                                Tokenizer
                                                                            Filterer n
                                                                    n

25             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counters Persistence & Contention
 Why not keep things in memory?
      Treat it as a SoR
                                                                                  IMDG
                                                                       Counter
                           Tokenizer1                    Filterer 1   Updater 1

                                                                       Counter
                           Tokenizer2                    Filterer 2   Updater 2

                            Tokenizer                                  Counter
                                                         Filterer 3
                                3                                     Updater 3




                            Tokenizer                                  Counter
                                                         Filterer n
                                n                                     Updater n

26     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counters Persistence & Contention
 Why not keep things in memory (data grid)?
      Question: How to partition counters?
                                                                                  IMDG
                                                                       Counter
                           Tokenizer1                    Filterer 1   Updater 1

                                                                       Counter
                           Tokenizer2                    Filterer 2   Updater 2

                            Tokenizer                                  Counter
                                                         Filterer 3
                                3                                     Updater 3




                            Tokenizer                                  Counter
                                                         Filterer n
                                n                                     Updater n

27     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counters Persistence & Contention
 Why not keep things in memory?
      Question: How to partition counters? in
             Facebook keeps 80% of its data
             Memory (Stanford research)                                           IMDG
                                                                       Counter
                           Tokenizer1                    Filterer 1   Updater 1
                     RAM is 100-1000x faster than Disk
                                                Counter
                     (Random seek)
                       Tokenizer2  Filterer 2  Updater 2
                     • Disk: 5 -10ms
                        Tokenizer               Counter
                                   Filterer 3
                     • RAM: ~0.001msec
                            3                  Updater 3




                            Tokenizer                                  Counter
                                                         Filterer n
                                n                                     Updater n

28     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Database Bottleneck – Storing Raw Tweets
  RDBMS won’t cut it (unless you have $$$$$)
      Hadoop / NoSQL to the rescue
      Use the data grid for batching and as a reliable
       transaction log
                           Persister
                              1

                           Persister
                              2




                           Persister
                              n
29     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counter Consistency & Resiliency
 Primary/hot backup setup, sync replication
 Transactional, atomic updates
                                    IMDG
 “On Demand” backups
                                                            P   B


                                                            P   B


                                                            P   B


                                                            P   B


30   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counter Consistency & Resiliency
 Primary/hot backup setup, sync replication
 Transactional, atomic updates
                                    IMDG
 “On Demand” backups
                                                            P   B
                                                                P


                                                            P   B


                                                            P   B


                                                            P   B


31   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Implementing Event Flows
 Use the data grid as event bus
 Stateful objects as transactional events

     Raw   Tokenizer               Tokenized      Filterer                Filtered   Counter




                                  P                                   B



32             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting It All Together
                                IMDG


                            P                    B


                            P                    B


                                                 B
                            P

                                                 B
                            P




33        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting It All Together

     • Use an IMDG IMDG
     • Partition events, counters
     • Collocate eventP handlers with data for low
                              B

       latency and simple scaling
     • Use atomic updates to update counters in
                      P       B

       memory
                      P       B
     • Persist asynchronously to a big data store
                                 P                     B


34             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Reducing Counter Contention - Optimization
 Store processed                             word1
  tweets locally in                           word2
  each partition
                                              word3
 Use periodic batch
  writes to update                                             1 sec
  counters
 Process batches as
  a whole



35                                 35
            ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Keep Your Eyes Open…




36      ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
References
 Learn and fork the code on github:
     https://github.com/Gigaspaces/rt-analytics
 Detailed blog post
     http://bit.ly/gs-bigdata-analytics
 Twitter in numbers:
     http://blog.twitter.com/2011/03/numbers.html
 Twitter Storm:
     http://bit.ly/twitter-storm
 Apache S4
     http://incubator.apache.org/s4/

37               ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
38   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Mais conteúdo relacionado

Mais procurados

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Big Data
Big DataBig Data
Big DataNGDATA
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopDataWorks Summit
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityDatabricks
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldDataWorks Summit
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 

Mais procurados (20)

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Big Data
Big DataBig Data
Big Data
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected World
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 

Destaque

Data Analytics on Twitter Feeds
Data Analytics on Twitter FeedsData Analytics on Twitter Feeds
Data Analytics on Twitter FeedsEu Jin Lok
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterKamran Munshi
 
Twitter Case Study: Just Falafel
Twitter Case Study: Just FalafelTwitter Case Study: Just Falafel
Twitter Case Study: Just FalafelFadi Khater
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopJames Kebinger
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analyticsrupika08
 
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Kevin Weil
 
The twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyThe twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyJohn Ashcroft
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
Twitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the WorldTwitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the WorldP J
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP Technology
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonKrishna Sankar
 

Destaque (20)

Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
Analyzing social media with Python and other tools (2/4)
Analyzing social media with Python and other tools (2/4) Analyzing social media with Python and other tools (2/4)
Analyzing social media with Python and other tools (2/4)
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
Data Analytics on Twitter Feeds
Data Analytics on Twitter FeedsData Analytics on Twitter Feeds
Data Analytics on Twitter Feeds
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @Twitter
 
The Big 5- Twitter
The Big 5- TwitterThe Big 5- Twitter
The Big 5- Twitter
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Twitter Case Study: Just Falafel
Twitter Case Study: Just FalafelTwitter Case Study: Just Falafel
Twitter Case Study: Just Falafel
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with Hadoop
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analytics
 
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
 
The twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyThe twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategy
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Twitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the WorldTwitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the World
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data Analysis
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 

Semelhante a Real Time Analytics for Twitter with a Scalable Architecture

Big Data in Real Time
Big Data in Real TimeBig Data in Real Time
Big Data in Real TimeRon Zavner
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSematext Group, Inc.
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Big data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauBig data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauIsCoolEnt
 
VA Smalltalk Update
VA Smalltalk UpdateVA Smalltalk Update
VA Smalltalk UpdateESUG
 
One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)Ex Machina
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessFrederik De Bruyne
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
 
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019Codemotion
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementAmazon Web Services
 
Digital Asset Management with Alfresco
Digital Asset Management with AlfrescoDigital Asset Management with Alfresco
Digital Asset Management with Alfrescorivetlogic
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentationTUSHAR GARG
 
Scalable olap with druid
Scalable olap with druidScalable olap with druid
Scalable olap with druidKashif Khan
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Lou Bajuk
 

Semelhante a Real Time Analytics for Twitter with a Scalable Architecture (20)

Big Data in Real Time
Big Data in Real TimeBig Data in Real Time
Big Data in Real Time
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL Backend
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Big data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauBig data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteau
 
VA Smalltalk Update
VA Smalltalk UpdateVA Smalltalk Update
VA Smalltalk Update
 
One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of business
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
 
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big Engagement
 
Digital Asset Management with Alfresco
Digital Asset Management with AlfrescoDigital Asset Management with Alfresco
Digital Asset Management with Alfresco
 
Big Data
Big DataBig Data
Big Data
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
 
Scalable olap with druid
Scalable olap with druidScalable olap with druid
Scalable olap with druid
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Data Mining & Engineering
Data Mining & EngineeringData Mining & Engineering
Data Mining & Engineering
 
Big Data
Big DataBig Data
Big Data
 
Big data by_mcal
Big data by_mcalBig data by_mcal
Big data by_mcal
 
Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)
 

Mais de Uri Cohen

Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Uri Cohen
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonUri Cohen
 
Alef event - going open source
Alef event - going open source Alef event - going open source
Alef event - going open source Uri Cohen
 
GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services Uri Cohen
 
In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified! In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified! Uri Cohen
 
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14Uri Cohen
 
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14 Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14 Uri Cohen
 
Deployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and CloudifyDeployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and CloudifyUri Cohen
 
Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!Uri Cohen
 
Changing organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecaseChanging organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecaseUri Cohen
 
GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!Uri Cohen
 
Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community Uri Cohen
 
Oscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now servedOscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now servedUri Cohen
 
OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid! OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid! Uri Cohen
 
One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops Uri Cohen
 
MongoDB in the Clouds
MongoDB in the CloudsMongoDB in the Clouds
MongoDB in the CloudsUri Cohen
 
Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012Uri Cohen
 
Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes Uri Cohen
 
Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud Uri Cohen
 

Mais de Uri Cohen (20)

Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax London
 
Alef event - going open source
Alef event - going open source Alef event - going open source
Alef event - going open source
 
GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services
 
In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified! In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified!
 
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
 
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14 Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
 
Deployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and CloudifyDeployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and Cloudify
 
Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!
 
Changing organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecaseChanging organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecase
 
GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!
 
Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community
 
Oscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now servedOscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now served
 
OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid! OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid!
 
One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops
 
MongoDB in the Clouds
MongoDB in the CloudsMongoDB in the Clouds
MongoDB in the Clouds
 
Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012
 
Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes
 
Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud
 

Último

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Real Time Analytics for Twitter with a Scalable Architecture

  • 1. Real Time Analytics for Big Data A Twitter Case Study Uri Cohen Head of Product @uri1803
  • 2. Big Data Predictions “Over the next few years we'll see the adoption of scalable frameworks and platforms for handling streaming, or near real-time, analysis and processing. In the same way that Hadoop has been borne out of large-scale web applications, these platforms will be driven by the needs of large- scale location-aware mobile, social and sensor use.” Edd Dumbill, O’REILLY ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 2
  • 3. We’re Living in a Real Time World… Facebook Real Time Real Time User Google RT Web Analytics Social Analytics Engagement Twitter paid tweet analytics New Real Time Google Real Time Search Analytics Startups 3 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 4. 3 Flavors to Analytics Counting Correlating Research 4 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 5. Analytics @ Twitter – Counting  How many signups, tweets, retweet s for a topic?  What’s the average latency?  Demographics  Countries and cities  Gender  Age groups  Device types  … 5 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 6. Analytics @ Twitter – Correlating  What devices fail at the same time?  What features get user hooked?  What places on the globe are “happening”? 6 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 7. Analytics @ Twitter – Research  Sentiment analysis  “Obama is popular”  Trends  “People like to tweet after watching American Idol”  Spam patterns  How can you tell when a user spams? 7 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 8. It’s All about Timing “Real time” Reasonably Quick Batch (< few Seconds) (seconds - minutes) (hours/days) 8 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 9. It’s All about Timing • Event driven / stream processing • High resolution – every tweet gets counted • Ad-hoc querying This is what • Medium resolution (aggregations) here we’re to discuss  • Long running batch jobs (ETL, map/reduce) • Low resolution (trends & patterns) 9 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 10. Twitter in Numbers (March 2011) It takes a week for users to send 1 billion Tweets. Source: http://blog.twitter.com/2011/03/numbers.html 10 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 11. Twitter in Numbers (March 2011) On average, 140 million tweets get sent every day. Source: http://blog.twitter.com/2011/03/numbers.html 11 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 12. Twitter in Numbers (March 2011) The highest throughput to date is 6,939 tweets/sec. Source: http://blog.twitter.com/2011/03/numbers.html 12 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 13. Twitter in Numbers (March 2011) 460,000 new accounts are created daily. Source: http://blog.twitter.com/2011/03/numbers.html 13 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 14. Twitter in Numbers 5% of the users generate 75% of the content. Source: http://www.sysomos.com/insidetwitter/ 14 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 15. Challenge 1 – Computing Reach 15 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 16. Challenge 1 – Computing Reach Tweets Followers Distinct Followers Count Reach 16 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 17. Challenge 2 – Word Count Tweets Count Word:Count • Hottest topics • URL mentions • etc. 17 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 18. URL Mentions – Here’s One Use Case 18 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 19. Analyze the Problem  (Tens of) thousands of tweets per second to process  Assumption: Need to process in near real time  Aggregate counters for each word  A few 10s of thousands of words (or hundreds of thousands if we include URLs)  System needs to linearly scale 19 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 20. It’s Really Just an Online Map/Reduce 20 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 21. Event Driven Flow Store Research Raw Tokenize Filter Count Store Counters 21 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 22. It’s Not That Simple…  (Tens) of thousands of tweets to tokenize every second, even more words to filter  CPU bottleneck  Tens/hundreds of thousands of counters to update  Counters contention  Tens/hundreds of thousands of counters to persist  Database bottleneck  (Tens) of thousands of tweets to store every second in the database  Database bottleneck 22 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 23. Implementation Challenges  Twitter is growing by the day, how can this scale?  Fault tolerance. What happens if a server fails?  Consistency. Counters should be accurate! 23 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 24. Solutions, Solutions 24 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 25. Dealing with the Massive Tweet Stream  We need a system that could scale linearly  Event partitioning to the rescue!  What to partition by? Tokenizer1 Filterer 1 Tokenizer2 Filterer 2 Tokenizer Filterer 3 3 Tokenizer Filterer n n 25 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 26. Counters Persistence & Contention  Why not keep things in memory?  Treat it as a SoR IMDG Counter Tokenizer1 Filterer 1 Updater 1 Counter Tokenizer2 Filterer 2 Updater 2 Tokenizer Counter Filterer 3 3 Updater 3 Tokenizer Counter Filterer n n Updater n 26 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 27. Counters Persistence & Contention  Why not keep things in memory (data grid)?  Question: How to partition counters? IMDG Counter Tokenizer1 Filterer 1 Updater 1 Counter Tokenizer2 Filterer 2 Updater 2 Tokenizer Counter Filterer 3 3 Updater 3 Tokenizer Counter Filterer n n Updater n 27 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 28. Counters Persistence & Contention  Why not keep things in memory?  Question: How to partition counters? in Facebook keeps 80% of its data Memory (Stanford research) IMDG Counter Tokenizer1 Filterer 1 Updater 1 RAM is 100-1000x faster than Disk Counter (Random seek) Tokenizer2 Filterer 2 Updater 2 • Disk: 5 -10ms Tokenizer Counter Filterer 3 • RAM: ~0.001msec 3 Updater 3 Tokenizer Counter Filterer n n Updater n 28 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 29. Database Bottleneck – Storing Raw Tweets  RDBMS won’t cut it (unless you have $$$$$)  Hadoop / NoSQL to the rescue  Use the data grid for batching and as a reliable transaction log Persister 1 Persister 2 Persister n 29 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 30. Counter Consistency & Resiliency  Primary/hot backup setup, sync replication  Transactional, atomic updates IMDG  “On Demand” backups P B P B P B P B 30 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 31. Counter Consistency & Resiliency  Primary/hot backup setup, sync replication  Transactional, atomic updates IMDG  “On Demand” backups P B P P B P B P B 31 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 32. Implementing Event Flows  Use the data grid as event bus  Stateful objects as transactional events Raw Tokenizer Tokenized Filterer Filtered Counter P B 32 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 33. Putting It All Together IMDG P B P B B P B P 33 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 34. Putting It All Together • Use an IMDG IMDG • Partition events, counters • Collocate eventP handlers with data for low B latency and simple scaling • Use atomic updates to update counters in P B memory P B • Persist asynchronously to a big data store P B 34 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 35. Reducing Counter Contention - Optimization  Store processed word1 tweets locally in word2 each partition word3  Use periodic batch writes to update 1 sec counters  Process batches as a whole 35 35 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 36. Keep Your Eyes Open… 36 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 37. References  Learn and fork the code on github: https://github.com/Gigaspaces/rt-analytics  Detailed blog post http://bit.ly/gs-bigdata-analytics  Twitter in numbers: http://blog.twitter.com/2011/03/numbers.html  Twitter Storm: http://bit.ly/twitter-storm  Apache S4 http://incubator.apache.org/s4/ 37 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 38. 38 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Notas do Editor

  1. ActiveInsight