SlideShare a Scribd company logo
1 of 27
Data Discovery, VisualizationData Discovery, Visualization
and Apache Hadoopand Apache Hadoop
An InformationWeek WebcastAn InformationWeek Webcast
Sponsored bySponsored by
Webcast LogisticsWebcast Logistics
TodayToday’s Presenters’s Presenters
Ted J. Wasserman
Product Manager
Tableau Software
John Kreisa
VP Strategic Marketing
Hortonworks
Lenny Liebmann
Contributing Editor
InformationWeek
© Hortonworks Inc. 2012
Agenda
• How Hadoop fits into the Modern Data Architecture
• How it works with your existing data center infrastructure
• Typical Hadoop patterns of use
• The importance of data discovery for all business users
• Get started with visual analytics software and Hadoop
• Demo
• Next Steps
Insert Poll 1 HEREInsert Poll 1 HERE
© Hortonworks Inc. 2012
Big Data: Changing The Game for Organizations
Page 6
Megabytes
Gigabytes
Terabytes
Petabytes
Purchase detail
Purchase record
Payment record
ERP
CRM
WEB
BIG DATA
Offer details
Support Contacts
Customer Touches
Segmentation
Web logs
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic Funnels
User Generated Content
Mobile Web
SMS/MMSSentiment
External Demographics
HD Video, Audio, Images
Speech to Text
Product/Service Logs
Social Interactions & Feeds
Business Data Feeds
User Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
Increasing Data Variety and Complexity
Transactions + Interactions
+ Observations
= BIG DATA
© Hortonworks Inc. 2013
Existing Data Architecture
TRADITIONAL REPOS
RDBMS EDW MPP
OLTP,
POS
SYSTEMS
MANAGE
&
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
Page 7
© Hortonworks Inc. 2013
Emerging Data Architecture
TRADITIONAL REPOS
RDBMS EDW MPP
OLTP,
POS
SYSTEMS
MANAGE
&
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
ENTERPRISE
HADOOP PLATFORM
Page 8
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 9
TRADITIONAL REPOS Viewpoint
Microsoft Applications
HORTONWORKS
DATA PLATFORM
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
© Hortonworks Inc. 2013
Big Data
Transactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKS
DATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 10
© Hortonworks Inc. 2013
Business Cases of Hadoop
Vertical Refine Explore Enrich
Retail & Web
• Log Analysis/Site
Optimization
• Social Network Analysis
• Dynamic Pricing
• Session & Content
Optimization
Retail
• Loyalty Program
Optimization
• Brand and Sentiment
Analysis
• Dynamic Pricing/Targeted
Offer
Intelligence • Threat Identification • Person of Interest Discovery • Cross Jurisdiction Queries
Finance
• Risk Modeling & Fraud
Identification
• Trade Performance
Analytics
• Surveillance and Fraud
Detection
• Customer Risk Analysis
• Real-time upsell, cross sales
marketing offers
Energy
• Smart Grid: Production
Optimization
• Grid Failure Prevention
• Smart Meters
• Individual Power Grid
Manufacturing • Supply Chain Optimization • Customer Churn Analysis
• Dynamic Delivery
• Replacement parts
Healthcare &
Payer
• Electronic Medical Records
(EMPI)
• Clinical Trials Analysis • Insurance Premium
Determination
© Hortonworks Inc. 2012
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 12
PLATFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Manage &
Operate at
Scale
Store,
Process and
Access Data
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Storage & Processing
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
•The ONLY 100% open source
and complete distribution
•Enterprise grade, proven and
tested at scale
•Ecosystem endorsed to ensure
interoperability
Enterprise Readiness
© Hortonworks Inc. 2012
What We Do…
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks Data
Platform
• We engineer, test & certify
HDP for enterprise usage
• We employ the core
architects, builders and
operators of Apache
Hadoop
• We drive innovation within
Apache Software
Foundation projects
• We are uniquely positioned
to deliver the highest quality
of Hadoop support
• We enable the ecosystem
to work better with Hadoop
DevelopDevelop DistributeDistribute SupportSupport
We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA
Employees: 200+ and growing
Investors: Benchmark, Index, Yahoo
Insert Poll 2 HEREInsert Poll 2 HERE
DEMO
© Hortonworks Inc. 2013
Hortonworks Sandbox
Fastest Onramp to Apache Hadoop
• What is it
– A free download of a virtualized single-node implementation of the enterprise-ready
Hortonworks Data Platform
– A personal Hadoop environment
– An integrated learning environment with frequently, easily updatable hands-on step-by-
step tutorials
• What it does
– Dramatically accelerates the process of learning Apache Hadoop
– Accelerate and validates the use of Hadoop within your unique data architecture
– Use your data to explore and investigate your use cases
• ZERO to big data in 15 minutes
• Get Started!
Page 23
Download Hortonworks Sandbox
www.hortonworks.com/sandbox
Sign up for Training for in-depth learning
hortonworks.com/hadoop-training/
© Hortonworks Inc. 2013
Hadoop Summit 2013
• June 26-27, 2013- San Jose Convention Cntr
• Co-hosted by Hortonworks & Yahoo!
• Theme: Enabling the Next Generation
Enterprise Data Platform
• 90+ Sessions and 7 Tracks:
• Community Focused Event
– Sessions selected by a Conference Committee
– Community Choice allowed public to vote for sessions
they want to see
• Training classes offered pre event
– Apache Hadoop Essentials: A Technical
Understanding for Business Users
– Understanding Microsoft HDInsight
and Apache Hadoop
– Developing Solutions with Apache
Hadoop – HDFS and MapReduce
– Applying Data Science using Apache Hadoop
Page 24
hadoopsummit.org
© Hortonworks Inc. 2012
Next Steps
• Try Tableau on Hortonworks Sandbox!
• Download Sandbox
– Hortonworks.com/sandbox
• Download Tableau trial
– Tableausoftware.com/trial
• Visit Hortonworks blog on connecting Tableau to the
Sandbox
– http://hortonworks.com/kb/how-to-connect-tableau-to-
hortonworks-sandbox/
Q&AQ&A
Ted J. Wasserman
Product Manager
Tableau Software
John Kreisa
VP Strategic Marketing
Hortonworks
Lenny Liebmann
Contributing Editor
InformationWeek
ResourcesResources
To View This or Other Events On-Demand Please Visit:
http://www.informationweek.com/events
http://www.netseminar.com
For more information please visit:
http://hortonworks.com/products/hortonworks-sandbox/

More Related Content

What's hot

Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
Hortonworks
 

What's hot (20)

YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 

Viewers also liked

Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
DataWorks Summit
 
Uses of ict in our environment
Uses of ict in our environmentUses of ict in our environment
Uses of ict in our environment
Jeet Kothadiya
 
Visualize Big Graph Data
Visualize Big Graph DataVisualize Big Graph Data
Visualize Big Graph Data
Mathieu Bastian
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
Gephi Consortium
 

Viewers also liked (20)

Data Discovery for Big Big Insights - Tableau Webinar Slides
Data Discovery for Big Big Insights - Tableau Webinar SlidesData Discovery for Big Big Insights - Tableau Webinar Slides
Data Discovery for Big Big Insights - Tableau Webinar Slides
 
Talend Data Preparation Overview
Talend Data Preparation OverviewTalend Data Preparation Overview
Talend Data Preparation Overview
 
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
Analytics, Everywhere. Keys to Effective Analytics and Data DiscoveryAnalytics, Everywhere. Keys to Effective Analytics and Data Discovery
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Data
 
Ontology Alignment using Linked Data
Ontology Alignment using Linked DataOntology Alignment using Linked Data
Ontology Alignment using Linked Data
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made Easy
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Semantic data mining: an ontology based approach
Semantic data mining: an ontology based approachSemantic data mining: an ontology based approach
Semantic data mining: an ontology based approach
 
Uses of ict in our environment
Uses of ict in our environmentUses of ict in our environment
Uses of ict in our environment
 
Visualize Big Graph Data
Visualize Big Graph DataVisualize Big Graph Data
Visualize Big Graph Data
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Oil & Gas Big Data use cases
Oil & Gas Big Data use casesOil & Gas Big Data use cases
Oil & Gas Big Data use cases
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 

Similar to Data Discovery, Visualization, and Apache Hadoop

Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 

Similar to Data Discovery, Visualization, and Apache Hadoop (20)

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 

More from Hortonworks

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Data Discovery, Visualization, and Apache Hadoop

  • 1. Data Discovery, VisualizationData Discovery, Visualization and Apache Hadoopand Apache Hadoop An InformationWeek WebcastAn InformationWeek Webcast Sponsored bySponsored by
  • 3. TodayToday’s Presenters’s Presenters Ted J. Wasserman Product Manager Tableau Software John Kreisa VP Strategic Marketing Hortonworks Lenny Liebmann Contributing Editor InformationWeek
  • 4. © Hortonworks Inc. 2012 Agenda • How Hadoop fits into the Modern Data Architecture • How it works with your existing data center infrastructure • Typical Hadoop patterns of use • The importance of data discovery for all business users • Get started with visual analytics software and Hadoop • Demo • Next Steps
  • 5. Insert Poll 1 HEREInsert Poll 1 HERE
  • 6. © Hortonworks Inc. 2012 Big Data: Changing The Game for Organizations Page 6 Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMSSentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA
  • 7. © Hortonworks Inc. 2013 Existing Data Architecture TRADITIONAL REPOS RDBMS EDW MPP OLTP, POS SYSTEMS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) BUILD & TEST Business Analytics Custom Applications Enterprise Applications Page 7
  • 8. © Hortonworks Inc. 2013 Emerging Data Architecture TRADITIONAL REPOS RDBMS EDW MPP OLTP, POS SYSTEMS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media) BUILD & TEST Business Analytics Custom Applications Enterprise Applications ENTERPRISE HADOOP PLATFORM Page 8
  • 9. © Hortonworks Inc. 2013 Interoperating With Your Tools Page 9 TRADITIONAL REPOS Viewpoint Microsoft Applications HORTONWORKS DATA PLATFORM Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media)
  • 10. © Hortonworks Inc. 2013 Big Data Transactions, Interactions, Observations Hadoop Common Patterns of Use Business Cases HORTONWORKS DATA PLATFORM Refine Explore Enrich Batch Interactive Online “Right-time” Access to Data Page 10
  • 11. © Hortonworks Inc. 2013 Business Cases of Hadoop Vertical Refine Explore Enrich Retail & Web • Log Analysis/Site Optimization • Social Network Analysis • Dynamic Pricing • Session & Content Optimization Retail • Loyalty Program Optimization • Brand and Sentiment Analysis • Dynamic Pricing/Targeted Offer Intelligence • Threat Identification • Person of Interest Discovery • Cross Jurisdiction Queries Finance • Risk Modeling & Fraud Identification • Trade Performance Analytics • Surveillance and Fraud Detection • Customer Risk Analysis • Real-time upsell, cross sales marketing offers Energy • Smart Grid: Production Optimization • Grid Failure Prevention • Smart Meters • Individual Power Grid Manufacturing • Supply Chain Optimization • Customer Churn Analysis • Dynamic Delivery • Replacement parts Healthcare & Payer • Electronic Medical Records (EMPI) • Clinical Trials Analysis • Insurance Premium Determination
  • 12. © Hortonworks Inc. 2012 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 12 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Process and Access Data HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Hortonworks Data Platform (HDP) Enterprise Hadoop •The ONLY 100% open source and complete distribution •Enterprise grade, proven and tested at scale •Ecosystem endorsed to ensure interoperability Enterprise Readiness
  • 13. © Hortonworks Inc. 2012 What We Do… • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop DevelopDevelop DistributeDistribute SupportSupport We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo
  • 14. Insert Poll 2 HEREInsert Poll 2 HERE
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. DEMO
  • 23. © Hortonworks Inc. 2013 Hortonworks Sandbox Fastest Onramp to Apache Hadoop • What is it – A free download of a virtualized single-node implementation of the enterprise-ready Hortonworks Data Platform – A personal Hadoop environment – An integrated learning environment with frequently, easily updatable hands-on step-by- step tutorials • What it does – Dramatically accelerates the process of learning Apache Hadoop – Accelerate and validates the use of Hadoop within your unique data architecture – Use your data to explore and investigate your use cases • ZERO to big data in 15 minutes • Get Started! Page 23 Download Hortonworks Sandbox www.hortonworks.com/sandbox Sign up for Training for in-depth learning hortonworks.com/hadoop-training/
  • 24. © Hortonworks Inc. 2013 Hadoop Summit 2013 • June 26-27, 2013- San Jose Convention Cntr • Co-hosted by Hortonworks & Yahoo! • Theme: Enabling the Next Generation Enterprise Data Platform • 90+ Sessions and 7 Tracks: • Community Focused Event – Sessions selected by a Conference Committee – Community Choice allowed public to vote for sessions they want to see • Training classes offered pre event – Apache Hadoop Essentials: A Technical Understanding for Business Users – Understanding Microsoft HDInsight and Apache Hadoop – Developing Solutions with Apache Hadoop – HDFS and MapReduce – Applying Data Science using Apache Hadoop Page 24 hadoopsummit.org
  • 25. © Hortonworks Inc. 2012 Next Steps • Try Tableau on Hortonworks Sandbox! • Download Sandbox – Hortonworks.com/sandbox • Download Tableau trial – Tableausoftware.com/trial • Visit Hortonworks blog on connecting Tableau to the Sandbox – http://hortonworks.com/kb/how-to-connect-tableau-to- hortonworks-sandbox/
  • 26. Q&AQ&A Ted J. Wasserman Product Manager Tableau Software John Kreisa VP Strategic Marketing Hortonworks Lenny Liebmann Contributing Editor InformationWeek
  • 27. ResourcesResources To View This or Other Events On-Demand Please Visit: http://www.informationweek.com/events http://www.netseminar.com For more information please visit: http://hortonworks.com/products/hortonworks-sandbox/

Editor's Notes

  1. For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples. ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases. Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data. Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data. Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella. Moreover, business data feeds and publicly available data sets are also “big data”. So we should not minimize our thinking to just data that flows through an organization. Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example. The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available. One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics applied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime! Anyhow, this is what Big Data means to me…hopefully it makes sense to you. It is important to note that we think of big data beyond the traditional concepts of volume, velocity and variety into transactions, interactions and observations. In reality, this IS the big data our customers are dealing with.
  2. While overly simplistic, this graphic represents what we commonly see as a general data architecture: A set of data sources producing data A set of data systems to capture and store that data: most typically a mix of RDBMS and data warehouses A set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications. Your environment is undoubtedly more complicated, but conceptually it is likely similar.
  3. As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets). Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with: Existing applications – such as Tableau, SAS, Business Objects, etc, Existing databases and data warehouses for loading data to / from the data warehouse Development tools used for building custom applications Operational tools for managing and monitoring
  4. It is for that reason that we focus on HDP interoperability across all of these categories: Data systems HDP is endorsed and embedded with SQL Server, Teradata and more BI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and more With Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with Hadoop For Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDP Operational tools Integration with System Center, and with Teradata viewpoint
  5. So we ’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop. At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich. Refine captures the data into the platform and transforms (or refines it) into the desired formats. Explore is about creating laks of data that you can interactively surf through to find valuable insights. Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent. So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs. Let me cover these patterns in a little more detail.
  6. In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the: Core Services Platform Services Data Services Operational Services Required by the Enterprise user. And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  7. At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop. We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community. We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data Platform Given our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support you Our approach is also uniquely endorsed by some of the biggest vendors in the IT market Yahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo ’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the world Microsoft has partnered with Hortonworks to include HDP in: HDP for Windows, HDInsight Server, and HDInsight Service