SlideShare uma empresa Scribd logo
1 de 21
A Real Time Sentiment Analysis Application using
Hadoop and HBase in the Cloud




Jagane Sundar
Founder, AltoScale Inc.



June 14, 2012                      Hadoop Summit 2012


     AltoScale
AltoScale                               About me


Ø Extensive Knowledge of Hadoop, Cloud Compute and
  Virtualization
Ø Co-founder of AltoScale. We developed the Workbench
Ø Worked on Hadoop Management and Performance at
  Yahoo
Ø Primarily a systems and storage guy – have written TCP
  stacks and NFS Clients, Livebackup for KVM




2
AltoScale                   My Motivation




Ø Build a cool real time big data app in order
 to acquire a deep understanding of Real
 Time Big Data Systems in the cloud




3
AltoScale   What will you get out of this?




Ø See how easy it is to build a highly
 scalable real-time Big Data application
 using a variety of open source tools and
 technologies




4
AltoScale         Real Time Sentiment Analysis




        Ø Easily accessible real time signals
           v Twitter public status updates
           v Blog entries




5
AltoScale           Real Time Sentiment Analysis


Ø Two types of solutions to Real Time Sentiment
  Analysis
    v Keywords known a-priori
      o  Filter tweets by keyword
    v Open ended sentiment analysis (no a-priori
      knowledge of keywords)
      o  Random sample of all public tweets
          •  1 % of public tweets easily available
          •  10% (twitter firehose) may be available for purchase




6
AltoScale
                    Real Time Sentiment Analysis:
                          Application Architecture
                             Hadoop/HBase

                                          Service Node
                     TwitterSampler                         HBase REST Gateway


                          Analyze Sentiment




                                HBase every minute
                                Write a few new rows to




                                                                       Scan HTable
                                                            Hadoop Slave
                                        DataNode, Region Server

                                                               Hadoop Slave
                                                          DataNode, Region Server
                     Master                                      Hadoop Slave
                NN, HBase Master                            DataNode, Region Server

7
AltoScale
                              Real Time Sentiment Analysis:
                             Twitter Streaming API Overview

                                Twitter APIs




        REST APIs                                   Streaming APIs
    (Request/Response)                          (Persistent HTTP Conn)




           Public Streams             User Streams             Site Streams
           (Sample of all             (One User’s             (Multiple Users’
           public updates)              updates)                 updates)
                    filter

                   sample                      We use this API to
                                               collect tweets
8                  firehose
AltoScale
                  Real Time Sentiment Analysis:
                          Time Series Database




    Ø Inspired by TSDB, but does not use TSDB
    Ø Read Benoît “tsuna” Sigoure’s slides from
      HBaseCon 2012




9
AltoScale
                               Real Time Sentiment Analysis:
                                                   in HBase



          Row              NEUTRAL   POSITIVE   NEGATIVE       Sample
                                                               Tweets
obama:2012:06:04:13:34    1          4          0          sdac soasp few


romney:2012:06:04:13:34   2          3          1          Smsm djcn dje
                                                           jdj
davebarry:2012:06:04:13:34 0         9          0          cs dsjw ausj




    10
AltoScale
                 Real Time Sentiment Analysis:
                                   Front Page




11
AltoScale
                 Real Time Sentiment Analysis:
                                 Results Page




12
AltoScale
                       Real Time Sentiment Analysis:
                  Standing on the Shoulders of Giants

Ø Hadoop and HBase, of course
Ø Twitter4j library for getting the twitter stream
Ø Sentiment Analysis
     v https://code.google.com/p/twitter-sentiment-analysis/
     v Weka Library

Ø Tomcat
Ø Jquery, dojo for javascript client




13
AltoScale
                    Real Time Sentiment Analysis:
             Twitter Stream API - TsStatusListener

public static class TsStatusListener implements StatusListener {
       public void onStatus(Status status) {
               Item item = wm.weightedClassify(status.getText());
               int polarity = 0;
               try {
                   polarity = Integer.parseInt(item.getPolarity().trim());
               } catch (NumberFormatException nfe) {
               }
               updateKeywordTrackers(status, polarity);
       }
}
14
AltoScale
                                             Real Time Sentiment Analysis:
                                                         Writing to HBase
private void writeToHBase() {
             Calendar cal = Calendar.getInstance();
             String calStr = String.format("%04d", (cal.get(Calendar.YEAR)))
                           + ":" + String.format("%02d", cal.get(Calendar.MONTH) + 1)
                           + ":" + String.format("%02d", cal.get(Calendar.DAY_OF_MONTH))
                           + ":" + String.format("%02d", cal.get(Calendar.HOUR_OF_DAY))
                           + ":" + String.format("%02d", cal.get(Calendar.MINUTE));
             String rowKey = keyword + ":" + calStr;
             Put put = new Put(rowKey.getBytes());
             put.add(COLFAM1.getBytes(), "NEUTRAL".getBytes(), tracker.getNeutralCount().getBytes());
             put.add(COLFAM1.getBytes(), "POSITIVE".getBytes(), tracker.getPositiveCount().getBytes());
             put.add(COLFAM1.getBytes(), "NEGATIVE".getBytes(), tracker.getNegativeCount().getBytes());

             try {
                           table.put(put);
             } catch (Exception ex) {
                           System.err.println(ex);
             }
}
    15
AltoScale
                                                           Reading from HBase
                                                              Various Options
                         Technologies for Writing HBase Clients

                                              Service Node

Option 1: HBase Client                   Java Client linked to
                                         HBase Client classes




                                               Service Node                                 Service Node


                                                                                        Thrift Client
Option 2: Thrift RPC                     HBase Thrift Gateway
                                                                 Thrift protocol




                                    16                   Service Node


                                         HBase REST Gateway
Option 3: REST API                                               REST (HTTP or HTTPS)
AltoScale
                                                  Reading from HBase
                                  and presenting to the user’s browser
     Hadoop/HBase in the cloud

                                 Service Node
       HBase REST Gateway
                                         REST scan            Tomcat
                                                               Proxy


                                                     Static
                                                     html
                   Scan HTable




                                   Hadoop Slave
                   DataNode, Region Server

                                      Hadoop Slave
                                 DataNode, Region Server
     Master                             Hadoop Slave
NN, HBase Master                   DataNode, Region Server

17
AltoScale                         Tomcat as HTTP Proxy


Ø HBase Stardust REST Server runs on port 8081 and is
  connected to the HBase
Ø The REST server has the capability to scan tables
Ø A javascript webpage is the client
Ø Problem:
     v JavaScript security restrictions do now allow the JavaScript to
        execute REST calls to any server other than the one it was
        loaded from
     v Tomcat is used as a proxy. It serves up:
        o  Static html pages with the javascript client, images etc.
        o  REST requests from the javascript client are proxied to the HBase
           Stardust server running on port 8081
18
AltoScale                  Future Improvements


Ø Elastic HBase in the cloud
Ø At night time, use on VM to receive tweets and write out
  into SequenceFiles in S3
Ø Before business hours, start up HBase, run a MR job to
  process all these SequenceFiles and write into HBase
Ø Cost effective real time HBase application in the cloud




19
AltoScale              Big Data Apps in the Cloud


Ø The Cloud is suitable for Big Data apps which use Big
  Data from the Internet. For example:
     v Twitter Public Status Updates
     v Blog entries
     v Web Crawl data

Ø Big Data apps in the cloud are not useful if all your data
  is generated inside your network
     v Router, Storage device, Authentication device logs
     v Logs from Web Servers located inside your network




20
AltoScale




Ø Questions, Comments, Flames?


       •  Thanks!
       •  Jagane Sundar
       •  jagane@altoscale.com




21

Mais conteúdo relacionado

Mais procurados

Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Language design and translation issues
Language design and translation issuesLanguage design and translation issues
Language design and translation issuesSURBHI SAROHA
 
Dag representation of basic blocks
Dag representation of basic blocksDag representation of basic blocks
Dag representation of basic blocksJothi Lakshmi
 
Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMSMegha Patel
 
Algorithm and pseudocode conventions
Algorithm and pseudocode conventionsAlgorithm and pseudocode conventions
Algorithm and pseudocode conventionssaranyatdr
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structureAbrish06
 
Relationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & PatternRelationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & PatternBharat Rathore
 
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSINGINTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSINGAaqib Hussain
 
Os lab file c programs
Os lab file c programsOs lab file c programs
Os lab file c programsKandarp Tiwari
 
Intermediate code generation in Compiler Design
Intermediate code generation in Compiler DesignIntermediate code generation in Compiler Design
Intermediate code generation in Compiler DesignKuppusamy P
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMRajendran
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol tablenadarmispapaulraj
 
Python | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialPython | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialQA TrainingHub
 

Mais procurados (20)

Python Pandas
Python PandasPython Pandas
Python Pandas
 
Process state in OS
Process state in OSProcess state in OS
Process state in OS
 
Language design and translation issues
Language design and translation issuesLanguage design and translation issues
Language design and translation issues
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
Dag representation of basic blocks
Dag representation of basic blocksDag representation of basic blocks
Dag representation of basic blocks
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
 
Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMS
 
Java Servlets
Java ServletsJava Servlets
Java Servlets
 
Deadlock ppt
Deadlock ppt Deadlock ppt
Deadlock ppt
 
Python basic
Python basicPython basic
Python basic
 
Algorithm and pseudocode conventions
Algorithm and pseudocode conventionsAlgorithm and pseudocode conventions
Algorithm and pseudocode conventions
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structure
 
Relationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & PatternRelationship Among Token, Lexeme & Pattern
Relationship Among Token, Lexeme & Pattern
 
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSINGINTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
 
Os lab file c programs
Os lab file c programsOs lab file c programs
Os lab file c programs
 
Intermediate code generation in Compiler Design
Intermediate code generation in Compiler DesignIntermediate code generation in Compiler Design
Intermediate code generation in Compiler Design
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEM
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol table
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
Python | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialPython | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python Tutorial
 

Destaque

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012Michael Wilde
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Building a Sentiment Analytics Solution powered by Machine Learning- Webinar QA
Building a Sentiment Analytics Solution powered by Machine Learning- Webinar QABuilding a Sentiment Analytics Solution powered by Machine Learning- Webinar QA
Building a Sentiment Analytics Solution powered by Machine Learning- Webinar QAImpetus Technologies
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsYiannis Kompatsiaris
 
Intro to Algebra II
Intro to Algebra IIIntro to Algebra II
Intro to Algebra IIteamxxlp
 
Packet capture and network traffic analysis
Packet capture and network traffic analysisPacket capture and network traffic analysis
Packet capture and network traffic analysisCARMEN ALCIVAR
 
Top 10 senior administrative officer interview questions and answers
Top 10 senior administrative officer interview questions and answersTop 10 senior administrative officer interview questions and answers
Top 10 senior administrative officer interview questions and answersannababy1245
 
Virtualization In Software Testing
Virtualization In Software TestingVirtualization In Software Testing
Virtualization In Software TestingColloquium
 
Vendor quality management
Vendor quality managementVendor quality management
Vendor quality managementG2Link
 
Video Quality Measurements
Video Quality MeasurementsVideo Quality Measurements
Video Quality MeasurementsYoss Cohen
 
Digital Platform Selection Best Practices
Digital Platform Selection Best PracticesDigital Platform Selection Best Practices
Digital Platform Selection Best Practicesedynamic
 
Hands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM DashboardHands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM DashboardCA Technologies
 
Defining Workplace Safety
Defining Workplace SafetyDefining Workplace Safety
Defining Workplace SafetyBruce Lambert
 
Which test cases to automate
Which test cases to automateWhich test cases to automate
Which test cases to automatesachxn1
 

Destaque (20)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Chem Lab Report (1)
Chem Lab Report (1)Chem Lab Report (1)
Chem Lab Report (1)
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Building a Sentiment Analytics Solution powered by Machine Learning- Webinar QA
Building a Sentiment Analytics Solution powered by Machine Learning- Webinar QABuilding a Sentiment Analytics Solution powered by Machine Learning- Webinar QA
Building a Sentiment Analytics Solution powered by Machine Learning- Webinar QA
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applications
 
Intro to Algebra II
Intro to Algebra IIIntro to Algebra II
Intro to Algebra II
 
Orbital Notation
Orbital NotationOrbital Notation
Orbital Notation
 
Packet capture and network traffic analysis
Packet capture and network traffic analysisPacket capture and network traffic analysis
Packet capture and network traffic analysis
 
Top 10 senior administrative officer interview questions and answers
Top 10 senior administrative officer interview questions and answersTop 10 senior administrative officer interview questions and answers
Top 10 senior administrative officer interview questions and answers
 
Virtualization In Software Testing
Virtualization In Software TestingVirtualization In Software Testing
Virtualization In Software Testing
 
Vendor quality management
Vendor quality managementVendor quality management
Vendor quality management
 
Video Quality Measurements
Video Quality MeasurementsVideo Quality Measurements
Video Quality Measurements
 
Digital Platform Selection Best Practices
Digital Platform Selection Best PracticesDigital Platform Selection Best Practices
Digital Platform Selection Best Practices
 
Analysis of water pollution presentaion by m.nadeem ashraf
Analysis of water pollution presentaion by m.nadeem ashrafAnalysis of water pollution presentaion by m.nadeem ashraf
Analysis of water pollution presentaion by m.nadeem ashraf
 
Hands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM DashboardHands-On Lab: Let's Build an ITSM Dashboard
Hands-On Lab: Let's Build an ITSM Dashboard
 
Defining Workplace Safety
Defining Workplace SafetyDefining Workplace Safety
Defining Workplace Safety
 
Which test cases to automate
Which test cases to automateWhich test cases to automate
Which test cases to automate
 

Semelhante a Realtime Sentiment Analysis Application Using Hadoop and HBase

Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerCodemotion
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseYahoo Developer Network
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Serverless by Examples and Case Studies
Serverless by Examples and Case StudiesServerless by Examples and Case Studies
Serverless by Examples and Case StudiesSrushith Repakula
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 

Semelhante a Realtime Sentiment Analysis Application Using Hadoop and HBase (20)

Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
מיכאל
מיכאלמיכאל
מיכאל
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Serverless by examples and case studies
Serverless by examples and case studiesServerless by examples and case studies
Serverless by examples and case studies
 
Serverless by Examples and Case Studies
Serverless by Examples and Case StudiesServerless by Examples and Case Studies
Serverless by Examples and Case Studies
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Último (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Realtime Sentiment Analysis Application Using Hadoop and HBase

  • 1. A Real Time Sentiment Analysis Application using Hadoop and HBase in the Cloud Jagane Sundar Founder, AltoScale Inc. June 14, 2012 Hadoop Summit 2012 AltoScale
  • 2. AltoScale About me Ø Extensive Knowledge of Hadoop, Cloud Compute and Virtualization Ø Co-founder of AltoScale. We developed the Workbench Ø Worked on Hadoop Management and Performance at Yahoo Ø Primarily a systems and storage guy – have written TCP stacks and NFS Clients, Livebackup for KVM 2
  • 3. AltoScale My Motivation Ø Build a cool real time big data app in order to acquire a deep understanding of Real Time Big Data Systems in the cloud 3
  • 4. AltoScale What will you get out of this? Ø See how easy it is to build a highly scalable real-time Big Data application using a variety of open source tools and technologies 4
  • 5. AltoScale Real Time Sentiment Analysis Ø Easily accessible real time signals v Twitter public status updates v Blog entries 5
  • 6. AltoScale Real Time Sentiment Analysis Ø Two types of solutions to Real Time Sentiment Analysis v Keywords known a-priori o  Filter tweets by keyword v Open ended sentiment analysis (no a-priori knowledge of keywords) o  Random sample of all public tweets •  1 % of public tweets easily available •  10% (twitter firehose) may be available for purchase 6
  • 7. AltoScale Real Time Sentiment Analysis: Application Architecture Hadoop/HBase Service Node TwitterSampler HBase REST Gateway Analyze Sentiment HBase every minute Write a few new rows to Scan HTable Hadoop Slave DataNode, Region Server Hadoop Slave DataNode, Region Server Master Hadoop Slave NN, HBase Master DataNode, Region Server 7
  • 8. AltoScale Real Time Sentiment Analysis: Twitter Streaming API Overview Twitter APIs REST APIs Streaming APIs (Request/Response) (Persistent HTTP Conn) Public Streams User Streams Site Streams (Sample of all (One User’s (Multiple Users’ public updates) updates) updates) filter sample We use this API to collect tweets 8 firehose
  • 9. AltoScale Real Time Sentiment Analysis: Time Series Database Ø Inspired by TSDB, but does not use TSDB Ø Read Benoît “tsuna” Sigoure’s slides from HBaseCon 2012 9
  • 10. AltoScale Real Time Sentiment Analysis: in HBase Row NEUTRAL POSITIVE NEGATIVE Sample Tweets obama:2012:06:04:13:34 1 4 0 sdac soasp few romney:2012:06:04:13:34 2 3 1 Smsm djcn dje jdj davebarry:2012:06:04:13:34 0 9 0 cs dsjw ausj 10
  • 11. AltoScale Real Time Sentiment Analysis: Front Page 11
  • 12. AltoScale Real Time Sentiment Analysis: Results Page 12
  • 13. AltoScale Real Time Sentiment Analysis: Standing on the Shoulders of Giants Ø Hadoop and HBase, of course Ø Twitter4j library for getting the twitter stream Ø Sentiment Analysis v https://code.google.com/p/twitter-sentiment-analysis/ v Weka Library Ø Tomcat Ø Jquery, dojo for javascript client 13
  • 14. AltoScale Real Time Sentiment Analysis: Twitter Stream API - TsStatusListener public static class TsStatusListener implements StatusListener { public void onStatus(Status status) { Item item = wm.weightedClassify(status.getText()); int polarity = 0; try { polarity = Integer.parseInt(item.getPolarity().trim()); } catch (NumberFormatException nfe) { } updateKeywordTrackers(status, polarity); } } 14
  • 15. AltoScale Real Time Sentiment Analysis: Writing to HBase private void writeToHBase() { Calendar cal = Calendar.getInstance(); String calStr = String.format("%04d", (cal.get(Calendar.YEAR))) + ":" + String.format("%02d", cal.get(Calendar.MONTH) + 1) + ":" + String.format("%02d", cal.get(Calendar.DAY_OF_MONTH)) + ":" + String.format("%02d", cal.get(Calendar.HOUR_OF_DAY)) + ":" + String.format("%02d", cal.get(Calendar.MINUTE)); String rowKey = keyword + ":" + calStr; Put put = new Put(rowKey.getBytes()); put.add(COLFAM1.getBytes(), "NEUTRAL".getBytes(), tracker.getNeutralCount().getBytes()); put.add(COLFAM1.getBytes(), "POSITIVE".getBytes(), tracker.getPositiveCount().getBytes()); put.add(COLFAM1.getBytes(), "NEGATIVE".getBytes(), tracker.getNegativeCount().getBytes()); try { table.put(put); } catch (Exception ex) { System.err.println(ex); } } 15
  • 16. AltoScale Reading from HBase Various Options Technologies for Writing HBase Clients Service Node Option 1: HBase Client Java Client linked to HBase Client classes Service Node Service Node Thrift Client Option 2: Thrift RPC HBase Thrift Gateway Thrift protocol 16 Service Node HBase REST Gateway Option 3: REST API REST (HTTP or HTTPS)
  • 17. AltoScale Reading from HBase and presenting to the user’s browser Hadoop/HBase in the cloud Service Node HBase REST Gateway REST scan Tomcat Proxy Static html Scan HTable Hadoop Slave DataNode, Region Server Hadoop Slave DataNode, Region Server Master Hadoop Slave NN, HBase Master DataNode, Region Server 17
  • 18. AltoScale Tomcat as HTTP Proxy Ø HBase Stardust REST Server runs on port 8081 and is connected to the HBase Ø The REST server has the capability to scan tables Ø A javascript webpage is the client Ø Problem: v JavaScript security restrictions do now allow the JavaScript to execute REST calls to any server other than the one it was loaded from v Tomcat is used as a proxy. It serves up: o  Static html pages with the javascript client, images etc. o  REST requests from the javascript client are proxied to the HBase Stardust server running on port 8081 18
  • 19. AltoScale Future Improvements Ø Elastic HBase in the cloud Ø At night time, use on VM to receive tweets and write out into SequenceFiles in S3 Ø Before business hours, start up HBase, run a MR job to process all these SequenceFiles and write into HBase Ø Cost effective real time HBase application in the cloud 19
  • 20. AltoScale Big Data Apps in the Cloud Ø The Cloud is suitable for Big Data apps which use Big Data from the Internet. For example: v Twitter Public Status Updates v Blog entries v Web Crawl data Ø Big Data apps in the cloud are not useful if all your data is generated inside your network v Router, Storage device, Authentication device logs v Logs from Web Servers located inside your network 20
  • 21. AltoScale Ø Questions, Comments, Flames? •  Thanks! •  Jagane Sundar •  jagane@altoscale.com 21