SlideShare uma empresa Scribd logo
1 de 25
Quick House Keeping Rule

• Q&A panel is available if you have any questions during the
 webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording




                                                                 Page 1
         © Hortonworks Inc. 2013
Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers


Jeff Markham
Solution Engineer

jmarkham@hortonworks.com




© Hortonworks Inc. 2013
Agenda
•   Introductions
•   Use Case Description
•   Preparation
•   Demo
•   Review
•   Q&A




                                   Page 3
         © Hortonworks Inc. 2013
Use Case Description
• Visualizing data
  • Tools vs. application development
  • Choosing the technology
      • Hortonworks Data Platform
      • RHadoop
      • Google Charts




                                        Page 4
        © Hortonworks Inc. 2013
Preparation: Install HDP

 OPERATIONAL                                 DATA                   Hortonworks
   SERVICES                                SERVICES
                                                                    Data Platform (HDP)
   Manage &
    AMBARI                        FLUME    Store, HIVE
                                           PIG
   Operate at                           Process and         HBASE   Enterprise Hadoop
     Scale                        SQOOP Access Data
     OOZIE                                 HCATALOG
                                                                    • The ONLY 100% open source
                                  WEBHDFS
                                  Distributed    MAP REDUCE           and complete distribution
  HADOOP CORE                     Storage & Processing (in 2.0)
                                   HDFS          YARN


  PLATFORM SERVICES                    Enterprise Readiness: HA,
                                       DR, Snapshots, Security, …
                                                                    • Enterprise grade, proven and
                                                                      tested at scale
                                  HORTONWORKS
                                  DATA PLATFORM (HDP)               • Ecosystem endorsed to
                                                                      ensure interoperability
   OS                 Cloud                 VM          Appliance




                                                                                                Page 5
        © Hortonworks Inc. 2013
Preparation: Install R
• Install R language

• Install appropriate packages
  – rhdfs
  – rmr2
  – googleVis
  – shiny
  – Dependencies for all above




                                 Page 6
      © Hortonworks Inc. 2013
Preparation
• rmr2
   – Functions to allow for MapReduce in R apps


• rhdfs
   – Functions allowing HDFS access in R apps


• googleVis
   – Use of Google Chart Tools in R apps


• shiny
   – Interactive web apps for R developers




                                                  Page 7
      © Hortonworks Inc. 2013
Demo Walkthrough
              Using Hadoop, R, and Google Chart Tools




© Hortonworks Inc. 2012
Visualization Use Case
• Data from CDC
                – Vital statistics publicly available data
                – 2010 US birth data file




                 S    201001     7      2        2               30105
                 2 011 06 1 123               3405 1 06 01      2 2
SAMPLE RECORD




                 0321     1006 314      2000                   2 222           22
                 2 2 2       122222 11   3 094 1        M 04 200940 39072     3941
                 083                22    2 2 22                        110 110 00
                 0000000 00    000000000 000000 000  000000000000000000011
                 101       1 111       10    1 1 1    111111         11   1 1 11




                                              source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

                                                                                                         Page 9
                    © Hortonworks Inc. 2013
Visualization Use Case
• Put data into HDFS
                     – Create input directory
                     – Put data into input directory
 CREATE HDFS DIR




                      > hadoop fs –mkdir /user/jeff/natality
PUT DATA INTO HDFS




                      > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT
                      /user/jeff/natality/




                                                                   Page 10
                         © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Specify use of RHadoop packages
           – Initialize HDFS
           – Specify data input and output location

            #!/usr/bin/env Rscript

            require('rmr2')
            require('rhdfs')
            hdfs.init()
R SCRIPT




            hdfs.data.root = 'natality'
            hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')
            hdfs.out.root = hdfs.data.root
            hdfs.out = file.path(hdfs.out.root, 'out')

             ...


                                                                               Page 11
               © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write mapper function
           – Write reducer function



            ...

            mapper = function(k, fields) {
              keyval(as.integer(substr(fields, 89, 90)),1)
            }
R SCRIPT




            reducer = function(key, vv) {
            # count values for each key
              keyval(key, sum(as.numeric(vv),na.rm=TRUE))
            }
             ...



                                                             Page 12
              © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write job function




            ...

            job = function (input, output) {
             mapreduce(input = input,
                    output = output,
R SCRIPT




                    input.format = "text",
                    map = mapper,
                    reduce = reducer,
                    combine = T)
            }...




                                               Page 13
              © Hortonworks Inc. 2013
Visualization Use Case
• Write R script
           – Write result to HDFS output directory




            ...
R SCRIPT




            out = from.dfs(job(hdfs.data, hdfs.out))
            results.df = as.data.frame(out,stringsAsFactors=F)




                                                                 Page 14
              © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application

                – Create directory
                – Create ui.R
                – Create server.R
SHINY APP DIR




                 > mkdir ~/my-shiny-app




                                             Page 15
                   © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
              – Create ui.R


               shinyUI(pageWithSidebar(

                # Application title
                headerPanel("2010 US Births"),

                sidebarPanel(. . .),
UI.R SOURCE




                 mainPanel(
                   tabsetPanel(
                     tabPanel("Line Chart", htmlOutput("lineChart")),
                     tabPanel("Column Chart", htmlOutput("columnChart"))
                   )
                 )
               ))



                                                                           Page 16
                 © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
                  – Create server.R


                   library(googleVis)
                   library(shiny)
                   library(rmr2)
                   library(rhdfs)
SERVER.R SOURCE




                   hdfs.init()

                   hdfs.data.root = 'natality'
                   hdfs.data = file.path(hdfs.data.root, 'out')
                   df = as.data.frame(from.dfs(hdfs.data))

                    ...




                                                                  Page 17
                      © Hortonworks Inc. 2013
Visualization Use Case
• Create Shiny application
                  – Create server.R



                   ...
                   shinyServer(function(input, output) {

                     output$lineChart <- renderGvis({
SERVER.R SOURCE




                       gvisLineChart(df, options=list(
                         vAxis="{title:'Number of Births'}",
                         hAxis="{title:'Age of Mother'}",
                         legend="none"
                      ))
                     })
                    ...




                                                               Page 18
                      © Hortonworks Inc. 2013
Visualization Use Case
• Run Shiny application

                > shiny::runApp('~/my-shiny-app')
                Loading required package: shiny

                Welcome to googleVis version 0.4.0
RUN SHINY APP




                ...

                HADOOP_CMD=/usr/bin/hadoop

                Be sure to run hdfs.init()

                Listening on port 8100




                                                     Page 19
                  © Hortonworks Inc. 2013
Visualization Use Case
• View Shiny application




                               Page 20
     © Hortonworks Inc. 2013
Demo Live
              Using Hadoop, R, and Google Chart Tools




© Hortonworks Inc. 2012
Visualization Use Case
• Architecture recap
  –   Analyze data sets with R on Hadoop
  –   Choose RHadoop packages
  –   Visualize data with Google Chart Tools via googleVis package
  –   Render googleVis output in Shiny applications


• Architecture next steps
  – Integrate Shiny application into existing web apps
  – Create further data models with R




                                                                 Page 22
      © Hortonworks Inc. 2013
HDP: Enterprise Hadoop Distribution

 OPERATIONAL                                 DATA                   Hortonworks
   SERVICES                                SERVICES
                                                                    Data Platform (HDP)
   Manage &
    AMBARI                        FLUME    Store, HIVE
                                           PIG
   Operate at                           Process and         HBASE   Enterprise Hadoop
     Scale                        SQOOP Access Data
     OOZIE                                 HCATALOG
                                                                    • The ONLY 100% open source
                                  WEBHDFS
                                  Distributed    MAP REDUCE           and complete distribution
  HADOOP CORE                     Storage & Processing (in 2.0)
                                   HDFS          YARN


  PLATFORM SERVICES                    Enterprise Readiness: HA,
                                       DR, Snapshots, Security, …
                                                                    • Enterprise grade, proven and
                                                                      tested at scale
                                  HORTONWORKS
                                  DATA PLATFORM (HDP)               • Ecosystem endorsed to
                                                                      ensure interoperability
   OS                 Cloud                 VM          Appliance




                                                                                                Page 23
        © Hortonworks Inc. 2013
HDP Sandbox




                             Page 24
   © Hortonworks Inc. 2013
Thank You!


Jeff Markham
Solution Engineer

jmarkham@hortonworks.com




                                Page 25
      © Hortonworks Inc. 2012

Mais conteúdo relacionado

Mais procurados

Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...DataWorks Summit
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionDataWorks Summit
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your BudgetHortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 

Mais procurados (20)

Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 

Semelhante a Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultNETWAYS
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pigRavi Mutyala
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeInside Analysis
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARNDataWorks Summit
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 

Semelhante a Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis (20)

Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARN
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 

Mais de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mais de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Último (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

  • 1. Quick House Keeping Rule • Q&A panel is available if you have any questions during the webinar • There will be time for Q&A at the end • We will record the webinar for future viewing • All attendees will receive a copy of the slides an recording Page 1 © Hortonworks Inc. 2013
  • 2. Hadoop, R, and Google Chart Tools Data Visualization for Application Developers Jeff Markham Solution Engineer jmarkham@hortonworks.com © Hortonworks Inc. 2013
  • 3. Agenda • Introductions • Use Case Description • Preparation • Demo • Review • Q&A Page 3 © Hortonworks Inc. 2013
  • 4. Use Case Description • Visualizing data • Tools vs. application development • Choosing the technology • Hortonworks Data Platform • RHadoop • Google Charts Page 4 © Hortonworks Inc. 2013
  • 5. Preparation: Install HDP OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 5 © Hortonworks Inc. 2013
  • 6. Preparation: Install R • Install R language • Install appropriate packages – rhdfs – rmr2 – googleVis – shiny – Dependencies for all above Page 6 © Hortonworks Inc. 2013
  • 7. Preparation • rmr2 – Functions to allow for MapReduce in R apps • rhdfs – Functions allowing HDFS access in R apps • googleVis – Use of Google Chart Tools in R apps • shiny – Interactive web apps for R developers Page 7 © Hortonworks Inc. 2013
  • 8. Demo Walkthrough Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  • 9. Visualization Use Case • Data from CDC – Vital statistics publicly available data – 2010 US birth data file S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 SAMPLE RECORD 0321 1006 314 2000 2 222 22 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 22 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 10 1 1 1 111111 11 1 1 11 source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm Page 9 © Hortonworks Inc. 2013
  • 10. Visualization Use Case • Put data into HDFS – Create input directory – Put data into input directory CREATE HDFS DIR > hadoop fs –mkdir /user/jeff/natality PUT DATA INTO HDFS > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/ Page 10 © Hortonworks Inc. 2013
  • 11. Visualization Use Case • Write R script – Specify use of RHadoop packages – Initialize HDFS – Specify data input and output location #!/usr/bin/env Rscript require('rmr2') require('rhdfs') hdfs.init() R SCRIPT hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT') hdfs.out.root = hdfs.data.root hdfs.out = file.path(hdfs.out.root, 'out') ... Page 11 © Hortonworks Inc. 2013
  • 12. Visualization Use Case • Write R script – Write mapper function – Write reducer function ... mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1) } R SCRIPT reducer = function(key, vv) { # count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE)) } ... Page 12 © Hortonworks Inc. 2013
  • 13. Visualization Use Case • Write R script – Write job function ... job = function (input, output) { mapreduce(input = input, output = output, R SCRIPT input.format = "text", map = mapper, reduce = reducer, combine = T) }... Page 13 © Hortonworks Inc. 2013
  • 14. Visualization Use Case • Write R script – Write result to HDFS output directory ... R SCRIPT out = from.dfs(job(hdfs.data, hdfs.out)) results.df = as.data.frame(out,stringsAsFactors=F) Page 14 © Hortonworks Inc. 2013
  • 15. Visualization Use Case • Create Shiny application – Create directory – Create ui.R – Create server.R SHINY APP DIR > mkdir ~/my-shiny-app Page 15 © Hortonworks Inc. 2013
  • 16. Visualization Use Case • Create Shiny application – Create ui.R shinyUI(pageWithSidebar( # Application title headerPanel("2010 US Births"), sidebarPanel(. . .), UI.R SOURCE mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) ) )) Page 16 © Hortonworks Inc. 2013
  • 17. Visualization Use Case • Create Shiny application – Create server.R library(googleVis) library(shiny) library(rmr2) library(rhdfs) SERVER.R SOURCE hdfs.init() hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'out') df = as.data.frame(from.dfs(hdfs.data)) ... Page 17 © Hortonworks Inc. 2013
  • 18. Visualization Use Case • Create Shiny application – Create server.R ... shinyServer(function(input, output) { output$lineChart <- renderGvis({ SERVER.R SOURCE gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) ... Page 18 © Hortonworks Inc. 2013
  • 19. Visualization Use Case • Run Shiny application > shiny::runApp('~/my-shiny-app') Loading required package: shiny Welcome to googleVis version 0.4.0 RUN SHINY APP ... HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() Listening on port 8100 Page 19 © Hortonworks Inc. 2013
  • 20. Visualization Use Case • View Shiny application Page 20 © Hortonworks Inc. 2013
  • 21. Demo Live Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  • 22. Visualization Use Case • Architecture recap – Analyze data sets with R on Hadoop – Choose RHadoop packages – Visualize data with Google Chart Tools via googleVis package – Render googleVis output in Shiny applications • Architecture next steps – Integrate Shiny application into existing web apps – Create further data models with R Page 22 © Hortonworks Inc. 2013
  • 23. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 23 © Hortonworks Inc. 2013
  • 24. HDP Sandbox Page 24 © Hortonworks Inc. 2013
  • 25. Thank You! Jeff Markham Solution Engineer jmarkham@hortonworks.com Page 25 © Hortonworks Inc. 2012

Notas do Editor

  1. Hi, I’m Jeff Markham and I wanted to talk today about
  2. Agenda points
  3. Describe the use case and how to choose the tech
  4. Start by installing HDP
  5. Install R and dependencies
  6. Go into more detail on the R packages
  7. Walk through the demo before actually doing the demo
  8. Describe the data set
  9. Start with the very beginning: getting the downloaded data into Hadoop
  10. Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing
  11. Explain the mapper and reducer functions
  12. Explain the job function
  13. Wrap up with showing where the data lands
  14. Show how to create the Shiny app. Start with creating the directory.
  15. This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate.
  16. Explain the server.R code. Note the imports of the relevant R packages.
  17. Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools
  18. Show how to kick off the Shiny app and note the listening port
  19. Go to the browser and view the Shiny app
  20. Cut to the live demo.
  21. Recap what we just saw and suggest possible future steps to further develop the app
  22. Hammer home HDP as the bedrock for the app
  23. Suggest getting started with the Sandbox
  24. Wrap up with Q &amp; A