SlideShare uma empresa Scribd logo
1 de 23
How Klout is changing the
landscape of social media with
Hadoop and BI
Dave Mariani
VP Engineering, Klout


Denny Lee
Principal Program Manager
Microsoft
Discover and be recognized for how you
          influence the world
Klout’s Big Data makes all this possible


   15 Social Networks Processed Every Day
   120 Terabytes of Data Storage
   200,000 Indexed Users Added Every Day
   140,000,000 Users Indexed Every Day
   1,000,000,000 Social Signals Processed Every Day
   30,000,000,000 API Calls Delivered Every Month
   54,000,000,000 Rows of Data In Klout Data Warehouse
                                                         3
KLOUT DATA ARCHITECTURE
                             THE BEST TOOL FOR THE JOB




                                               Registrations DB
                                                                                Klout.com
                                                   (MySql)
                                                                                (Node.js)


                                                                                 Mobile
                                                   Profile DB                  (ObjectiveC)




                                                                  Klout API
                                                                    (Scala)
                                                    (HBase)
   Signal
 Collectors        Data
                                                                               Partner API
(Java/Scala)   Enhancement
                  Engine                                                        (Mashery)
                              Data Warehouse
                (PIG/Hive)                       Search Index
                                   (Hive)
                                                (Elastic
                                                       Search)




                                                    Streams
                                                  (MongoDB)
                                                                               Monitoring
                                                                                (Nagios)

                                               Serving Stores
                                                                               Dashboards
                                                                                (Tableau)

                                                                              Perks Analyics
                                                                                 (Scala)
                                                  Analytics
                                                   Cubes                      Event Tracker
                                                   (SSAS)
                                                                                 (Scala)
What is Business Intelligence?
• Data Warehousing, OLAP, Dashboards, Reporting
• Ability to slice and dice data in an ad-hoc manner
• Getting the right data to the right people, at the right
  time
• i.e. Now




                                                             5
Why Hadoop + BI?




                                        Hadoop     BI
             Requirement                  &       Query
                                         Hive    Engines
  Capture & store all data               Yes       No
  Support queries against detail data    Yes       No
  Support interactive queries &          No        Yes
  applications
  Support BI & visualization tools       No        Yes




                                                           6
An Example: Klout Event Tracker
                                           1   Perform A|B Testing of User Flows

                                           2   Optimize Registration Funnels




3   Monitor consumer engagement & retention (DAUs & MAUs)

4   Flexibly track and report on user generated events
                                                                                   7
A Flexible, Hierarchical Schema


 Project:              Event:         Property Type:    Property Value:
Collection            Captured           Attribute         Attribute
of Events            User Action           Key              Value




HomePage,                               Source,        Google Search
 Actions,                               Gender,            Male
Mobile iOS                              Location            SF
             +K (Add a topic) event
Event Tracker Architecture                     event_log
                                                tstamp string
                                   {            project string
                                   "project":"plusK", string
                                                event
                                                session_id bigint
                                   "event":"spend",
               insights3:9003/track/{"project":”plu
                                                ks_uid bigint
               sK","event":”spend”,"session_id":"0",
                                   Warehouse
                                                ip string
                                   "ip":"50.68.47.158",
               "ks_uid":123456,”type":”add_topic"}
                                                json_keys array<string>
                                   "kloutId":“123456",
                                                json_values
                                   “cookie_id":”123456",
                                                array<string>
                                   "ref":"http://klout.com/",
                                                json_text string
                                   "type":"add_topic",
Tracker API       Log Process                          Cube
                                                dt string            Klout UI
                                   "time":"1338366015"
   Scala,           Flume                             Analysis        Scala,
                                   }            hr string
  node.JS                                             Services       AJAX UX
          SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]}
          ON COLUMNS,
                                           will be saved in HDFS at:
          NON EMPTY CROSSJOIN (            /logs/events_tracking/2012-05-30/0100
          exists([Date].[Date].[Date].allmembers,
          [Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06-
          02T00:00:00]),
          [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES
          MEMBER_CAPTION
          ON ROWS
          FROM [ProductInsight]
          WHERE ({[Projects].[Project].[plusK]})


Instrument          Collect           Persist              Query            Report
                                                                                     9
Hadoop & BI Together:
Query Cube using a Custom App




                                10
A peek into product insight >
A|B test : unsorted vs. Sorted




                                 11
A Peek into
Product Insights >
Projects: Mobile
iOS




                     12
13
Hadoop & BI Together:
Query Cube Using Viz App




                           14
15
16
Hadoop & BI Together:
Query Hive using CLI




                        17
HiveQL Example

SELECT
   get_json_object(json_text,'$.sid') as sid,
   get_json_object(json_text,'$.inc') as inc,
   get_json_object(json_text,'$.status') as status,
   event
FROM bi.event_log
WHERE project='mobile-ios'
   AND dt=20120612
   AND get_json_object(json_text,'$.v')<>'1.5'
   AND (event = 'api_error' OR event = 'api_timeout')
ORDER BY sid;
19
Hadoop & BI Together:
Query Hive using Excel




                         20
21
Why Hadoop + BI?




                                        Hadoop     BI
             Requirement                  &       Query
                                         Hive    Engines
  Capture & store all data               Yes       No
  Support queries against detail data    Yes       No
  Support interactive queries &          No        Yes
  applications
  Support BI & visualization tools       No        Yes




                                                           22
Any Questions?




                 23

Mais conteúdo relacionado

Mais procurados

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesDatabricks
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resumeTarun P
 
A Zen Journey to Database Management
A Zen Journey to Database ManagementA Zen Journey to Database Management
A Zen Journey to Database ManagementBasho Technologies
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 

Mais procurados (20)

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta Lakes
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
A Zen Journey to Database Management
A Zen Journey to Database ManagementA Zen Journey to Database Management
A Zen Journey to Database Management
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 

Destaque

Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
Klout Score - Understanding Influence, True Reach, Amplification, and Network
Klout Score - Understanding Influence, True Reach, Amplification, and NetworkKlout Score - Understanding Influence, True Reach, Amplification, and Network
Klout Score - Understanding Influence, True Reach, Amplification, and NetworkMarcus Nelson
 
Tecnologías Semánticas en la Web de Datos
Tecnologías Semánticas en la Web de DatosTecnologías Semánticas en la Web de Datos
Tecnologías Semánticas en la Web de DatosDatos.gob.es
 
소개서 xtrade(전자무역) system
소개서 xtrade(전자무역) system소개서 xtrade(전자무역) system
소개서 xtrade(전자무역) system춘웅 석
 
소개서 FTA원산지시스템 솔루션
소개서 FTA원산지시스템 솔루션소개서 FTA원산지시스템 솔루션
소개서 FTA원산지시스템 솔루션춘웅 석
 
Nuevo paradigma en catalogación: El modelo FRBR y las RDA
Nuevo paradigma en catalogación: El modelo FRBR y las RDANuevo paradigma en catalogación: El modelo FRBR y las RDA
Nuevo paradigma en catalogación: El modelo FRBR y las RDAUniversidad de Belgrano
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...Artium Vitoria
 
Mongodb 특징 분석
Mongodb 특징 분석Mongodb 특징 분석
Mongodb 특징 분석Daeyong Shin
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB
 
Gestión de redes sociales en la BNE. Ana Carrillo Pozas
Gestión de redes sociales en la BNE. Ana Carrillo PozasGestión de redes sociales en la BNE. Ana Carrillo Pozas
Gestión de redes sociales en la BNE. Ana Carrillo PozasBiblioteca Nacional de España
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 

Destaque (15)

Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Klout Score - Understanding Influence, True Reach, Amplification, and Network
Klout Score - Understanding Influence, True Reach, Amplification, and NetworkKlout Score - Understanding Influence, True Reach, Amplification, and Network
Klout Score - Understanding Influence, True Reach, Amplification, and Network
 
Elasticsearch @ Keboola
Elasticsearch @ KeboolaElasticsearch @ Keboola
Elasticsearch @ Keboola
 
Tecnologías Semánticas en la Web de Datos
Tecnologías Semánticas en la Web de DatosTecnologías Semánticas en la Web de Datos
Tecnologías Semánticas en la Web de Datos
 
소개서 xtrade(전자무역) system
소개서 xtrade(전자무역) system소개서 xtrade(전자무역) system
소개서 xtrade(전자무역) system
 
소개서 FTA원산지시스템 솔루션
소개서 FTA원산지시스템 솔루션소개서 FTA원산지시스템 솔루션
소개서 FTA원산지시스템 솔루션
 
Nuevo paradigma en catalogación: El modelo FRBR y las RDA
Nuevo paradigma en catalogación: El modelo FRBR y las RDANuevo paradigma en catalogación: El modelo FRBR y las RDA
Nuevo paradigma en catalogación: El modelo FRBR y las RDA
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
 
Mongodb 특징 분석
Mongodb 특징 분석Mongodb 특징 분석
Mongodb 특징 분석
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
 
Gestión de redes sociales en la BNE. Ana Carrillo Pozas
Gestión de redes sociales en la BNE. Ana Carrillo PozasGestión de redes sociales en la BNE. Ana Carrillo Pozas
Gestión de redes sociales en la BNE. Ana Carrillo Pozas
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Elasticsearch 簡介
Elasticsearch 簡介Elasticsearch 簡介
Elasticsearch 簡介
 

Semelhante a How Klout is changing the landscape of social media with Hadoop and BI

Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social mediaDataWorks Summit
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using OptiqJulian Hyde
 
RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...
RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...
RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...RightScale
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemTreasure Data, Inc.
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Digital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởng
Digital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởngDigital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởng
Digital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởngDuy, Vo Hoang
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionDATAVERSITY
 
How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...
How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...
How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...RightScale
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Build 2015 – Azure overview
Build 2015 – Azure overviewBuild 2015 – Azure overview
Build 2015 – Azure overviewLars Yde
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayGrega Kespret
 

Semelhante a How Klout is changing the landscape of social media with Hadoop and BI (20)

Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social media
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...
RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...
RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Sc...
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
 
Treasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on HerokuTreasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on Heroku
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Marketing vs Technology
Marketing vs TechnologyMarketing vs Technology
Marketing vs Technology
 
Digital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởng
Digital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởngDigital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởng
Digital Transformation 2021 - hiểu rõ để làm tốt và tăng trưởng
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
 
How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...
How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...
How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR S...
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Build 2015 – Azure overview
Build 2015 – Azure overviewBuild 2015 – Azure overview
Build 2015 – Azure overview
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 

Mais de Denny Lee

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceDenny Lee
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connectorDenny Lee
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesDenny Lee
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesDenny Lee
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerDenny Lee
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Denny Lee
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherDenny Lee
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarDenny Lee
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Denny Lee
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecurityDenny Lee
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008Denny Lee
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesDenny Lee
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDenny Lee
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataDenny Lee
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger BrainsDenny Lee
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Denny Lee
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeDenny Lee
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarDenny Lee
 

Mais de Denny Lee (20)

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger Brains
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

How Klout is changing the landscape of social media with Hadoop and BI

  • 1. How Klout is changing the landscape of social media with Hadoop and BI Dave Mariani VP Engineering, Klout Denny Lee Principal Program Manager Microsoft
  • 2. Discover and be recognized for how you influence the world
  • 3. Klout’s Big Data makes all this possible 15 Social Networks Processed Every Day 120 Terabytes of Data Storage 200,000 Indexed Users Added Every Day 140,000,000 Users Indexed Every Day 1,000,000,000 Social Signals Processed Every Day 30,000,000,000 API Calls Delivered Every Month 54,000,000,000 Rows of Data In Klout Data Warehouse 3
  • 4. KLOUT DATA ARCHITECTURE THE BEST TOOL FOR THE JOB Registrations DB Klout.com (MySql) (Node.js) Mobile Profile DB (ObjectiveC) Klout API (Scala) (HBase) Signal Collectors Data Partner API (Java/Scala) Enhancement Engine (Mashery) Data Warehouse (PIG/Hive) Search Index (Hive) (Elastic Search) Streams (MongoDB) Monitoring (Nagios) Serving Stores Dashboards (Tableau) Perks Analyics (Scala) Analytics Cubes Event Tracker (SSAS) (Scala)
  • 5. What is Business Intelligence? • Data Warehousing, OLAP, Dashboards, Reporting • Ability to slice and dice data in an ad-hoc manner • Getting the right data to the right people, at the right time • i.e. Now 5
  • 6. Why Hadoop + BI? Hadoop BI Requirement & Query Hive Engines Capture & store all data Yes No Support queries against detail data Yes No Support interactive queries & No Yes applications Support BI & visualization tools No Yes 6
  • 7. An Example: Klout Event Tracker 1 Perform A|B Testing of User Flows 2 Optimize Registration Funnels 3 Monitor consumer engagement & retention (DAUs & MAUs) 4 Flexibly track and report on user generated events 7
  • 8. A Flexible, Hierarchical Schema Project: Event: Property Type: Property Value: Collection Captured Attribute Attribute of Events User Action Key Value HomePage, Source, Google Search Actions, Gender, Male Mobile iOS Location SF +K (Add a topic) event
  • 9. Event Tracker Architecture event_log tstamp string { project string "project":"plusK", string event session_id bigint "event":"spend", insights3:9003/track/{"project":”plu ks_uid bigint sK","event":”spend”,"session_id":"0", Warehouse ip string "ip":"50.68.47.158", "ks_uid":123456,”type":”add_topic"} json_keys array<string> "kloutId":“123456", json_values “cookie_id":”123456", array<string> "ref":"http://klout.com/", json_text string "type":"add_topic", Tracker API Log Process Cube dt string Klout UI "time":"1338366015" Scala, Flume Analysis Scala, } hr string node.JS Services AJAX UX SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]} ON COLUMNS, will be saved in HDFS at: NON EMPTY CROSSJOIN ( /logs/events_tracking/2012-05-30/0100 exists([Date].[Date].[Date].allmembers, [Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06- 02T00:00:00]), [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES MEMBER_CAPTION ON ROWS FROM [ProductInsight] WHERE ({[Projects].[Project].[plusK]}) Instrument Collect Persist Query Report 9
  • 10. Hadoop & BI Together: Query Cube using a Custom App 10
  • 11. A peek into product insight > A|B test : unsorted vs. Sorted 11
  • 12. A Peek into Product Insights > Projects: Mobile iOS 12
  • 13. 13
  • 14. Hadoop & BI Together: Query Cube Using Viz App 14
  • 15. 15
  • 16. 16
  • 17. Hadoop & BI Together: Query Hive using CLI 17
  • 18. HiveQL Example SELECT get_json_object(json_text,'$.sid') as sid, get_json_object(json_text,'$.inc') as inc, get_json_object(json_text,'$.status') as status, event FROM bi.event_log WHERE project='mobile-ios' AND dt=20120612 AND get_json_object(json_text,'$.v')<>'1.5' AND (event = 'api_error' OR event = 'api_timeout') ORDER BY sid;
  • 19. 19
  • 20. Hadoop & BI Together: Query Hive using Excel 20
  • 21. 21
  • 22. Why Hadoop + BI? Hadoop BI Requirement & Query Hive Engines Capture & store all data Yes No Support queries against detail data Yes No Support interactive queries & No Yes applications Support BI & visualization tools No Yes 22

Notas do Editor

  1. Copy this from notepad for demo:CREATE TABLE mobile_ios_details_20120612 asSELECT get_json_object(json_text,&apos;$.sid&apos;) as sid, get_json_object(json_text,&apos;$.inc&apos;) as inc, get_json_object(json_text,&apos;$.status&apos;) as status, eventFROM bi.event_logWHERE project=&apos;mobile-ios&apos; AND dt=20120612 AND get_json_object(json_text,&apos;$.v&apos;)&lt;&gt;&apos;1.5&apos; AND (event = &apos;api_error&apos; OR event = &apos;api_timeout&apos;) ORDER BY sid;
  2. 1.Don’t throw data away, leverage Hadoop (track users and events for a/b testing)2. BI tools aggregate data, but we need to reach back to the detail to answer deeper questions (http codes)3. Hadoop != interactive queries (combined proprietary data with detail)4.Use open source, but don’t reinvent the wheel (BI tools are mature, valuable &amp; complementary)Leverage the best tool for the function or job