SlideShare uma empresa Scribd logo
1 de 24
Spark and Hadoop
Perfect Together
Arun Murthy
Hortonworks Co-Founder
@acmurthy
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Operating System
Enable all data and applications
TO BE
accessible and shared
BY
any end-users
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Operating System
YARN: data operating system
Governance Security
Operations
Resource management
Data access: batch, interactive, real-time
Storage
Commodity Appliance Cloud
Built on a centralized architecture of
shared enterprise services:
Scalable tiered storage
Resource and workload management
Trusted data governance & metadata management
Consistent operations
Comprehensive security
Developer APIs and tools
Hadoop/YARN-powered data operating system
100% open source, multi-tenant data platform for
any application, any data set, anywhere.
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Elegant Developer APIs
DataFrames, Machine Learning, and SQL
Made for Data Science
All apps need to get predictive at scale and fine granularity
Democratize Machine Learning
Spark is doing to ML on Hadoop what Hive did for SQL on
Hadoop
Community
Broad developer, customer and partner interest
Realize Value of Data Operating System
A key tool in the Hadoop toolbox
Why We Love Spark at Hortonworks
YARN
Scala
Java
Python
R
APIs
Spark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
HDFS
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Let’s Talk Real-World Use-Cases
Page6 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW
MANAGEMENT
Streaming & Machine Learning for Web Analytics
USE CASE:
Cost: Storage and
processing not economical
at scale
Silos: Separate clusters —
Hadoop & Spark
Analytics: Retroactive view
limited predictive
capabilities
CHALLENGE
Spark Streaming for
ingesting events in real-
time
SparkML for predictive
analytics
SOLUTION
Cost: Hardware costs
reduced 25-50%
YARN: Shared cluster for
Spark, MapReduce, Hive
etc.
Analytics: Processes 10
billion events daily at 20
milliseconds per event
IMPACT
HDP
Page7 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW
MANAGEMENT
Claims Re-imbursement Processing with Spark
USE CASE: INSURANCE
Overwhelmed by data
ingest rates
Team expertise in R
Lots of key features like
textual features not
incorporated
CHALLENGE
Use Spark to optimize
claims reimbursements
process
Leverage Spark’s machine
learning capabilities to
process and analyze all
claims.
SOLUTION
Insurance companies can
now process all claims and
detect over and under
payment
More accurate over and
under payment detection
IMPACT
HDP
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
On-going Customer Use Cases with Spark
HealthCare
Entity Resolution
• Different feeds represent entities in different
forms
– need to identify same entities
• No standard packages available to do the
job.
• Additional features need to be derived to
expand context and disambiguate
Requirement
• Extensible entity resolution framework
– mix supervised and unsupervised learning
John Smith J. Smith
A B
E
DC
H
GF
name name
location location
=
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
On-going Customer Use Cases with Spark
Finance
Efficiently bring HBase Data into
Spark
• HBase is the operational store of record
• End user Analytics via Spark
Requirement:
• Efficient scans via predicate pushdown
via co-processors ° ° ° ° ° ° ° ° °HDFS
YARN
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark + Hadoop - The Road Ahead
Innovate at the Core
Seamless Data Access
Data Science Acceleration
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Sources
Data
Science
Acceleration
Data Science ToolsNotebook
Data Science Libraries
Apache Hadoop, Hive and HBase
InnovateattheCore
Apache Spark
Innovate at the Core
Storage
– RDD Sharing with HDFS Memory Tier
Resource Management
– Spark on YARN improvements
Security
– Enhance SparkSQL Security
– Enhance Wire Encryption
Governance
– Integration with Atlas
Operations
– Ambari to Support Multiple Spark Versions
Zeppelin
Governance
Security
Operations
APIs
Spark Core EngineSpark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° NHDFS
YARN
SQL
Hive
NoSQL
Hbase
Enterprise Ready
Seamless
DataAccess
Spark + Hadoop - The Road Ahead
ORC FileNiFiHBase Custom
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Seamless Data Access
Data Access
– Hive ORC
– Hbase Connector
– NiFi Streams
Spark + Hadoop - The Road Ahead
Data Sources
Data
Science
Acceleration
Data Science ToolsNotebook
Data Science Libraries
Apache Hadoop, Hive and HBase
InnovateattheCore
Apache Spark
Zeppelin
Governance
Security
Operations
APIs
Spark Core EngineSpark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° NHDFS
YARN
SQL
Hive
NoSQL
Hbase
Enterprise Ready
Seamless
DataAccess
ORC FileNiFiHBase Custom
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Science Acceleration
Apache Zeppelin
– Spark development and visualization
Data Science Libraries
– Jump start tools leveraging Spark Machine
Learning (ie Magellan, Entity Resolution)
Core Machine Learning
– One vs Rest classifiers, multi-class
classifiers
– Serializing Machine Learning pipelines
– Data set loader for public data sets
Spark + Hadoop - The Road Ahead
Data Sources
Data
Science
Acceleration
Data Science ToolsNotebook
Data Science Libraries
Apache Hadoop, Hive and HBase
InnovateattheCore
Apache Spark
Zeppelin
Governance
Security
Operations
APIs
Spark Core EngineSpark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° NHDFS
YARN
SQL
Hive
NoSQL
Hbase
Enterprise Ready
Seamless
DataAccess
ORC FileNiFiHBase Custom
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache Zeppelin Open Web-based Notebook
for interactive analytics
Features
Ad-hoc experimentation
Deeply integrated with Spark + Hadoop
Supports multiple language backends
Incubating at Apache
Use Case
Data exploration and discovery
Visualization
Interactive snippet-at-a-time experience
“Modern Data Science Studio”
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Geospatial Insight
Where do people go on weekends?
Does usage pattern change with time?
Predict the drop off point of a user?
Predict the location where next pick up can be expected?
Identify crime hotspots
How do these hotspots evolve with time?
Predict the likelihood of crime occurring at a given neighborhood
Predict climate at fairly granular level
Climate insurance: do I need to buy insurance for my crops?
Climate as a factor in crime: Join climate dataset with Crimes
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Magellan on Spark Packages
What
– Brings GeoSpatial Analytics to Big Data powered by Spark
– Available at Spark Packages
http://spark-packages.org/package/harsha2010/magellan
Key Features
– Parse geospatial data and metadata into Shapes + Metadata
– Python and Scala support
– Efficient Geometric Queries
- ESRI Hive Library
- simple and intuitive syntax
– Scalable implementations of common algorithms
Learn More
– Magellan Blog: http://tinyurl.com/magellanBlog
Find Sample Magellan Notebook:
http://tinyurl.com/zeppelinNotebooks
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark + Hadoop - The Road Ahead
Innovate at the Core
Seamless Data Access
Data Science Acceleration
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Magellan Walkthrough
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Maximizing Revenue for UBER drivers
Data Insight Opportunity
– Uber publishes anonymized GeoSpatial trip data
– City of San Francisco has an active Open Data program
- demographics, neighborhoods, traffic
Challenges
– What neighborhood should a driver hangout to maximize revenue?
Solution
– Leverage Spark to transform and aggregate data
– Use Magellan to do GeoSpatial Queries
uber-magellan-nb
uber-magellan-nb
uber-magellan-nb
uber-magellan-nb
Hortonworks Data Platform
Perfect Together
+ Hadoop

Mais conteúdo relacionado

Mais procurados

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Hortonworks
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...DataWorks Summit
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 

Mais procurados (20)

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare Transformation
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 

Semelhante a Spark and Hadoop Perfect Togeher by Arun Murthy

Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect togetherIsheeta Sanghi
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHortonworks
 
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Spark Summit
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 

Semelhante a Spark and Hadoop Perfect Togeher by Arun Murthy (20)

Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect together
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 

Mais de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Mais de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Último (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Spark and Hadoop Perfect Togeher by Arun Murthy

  • 1. Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 2. Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Operating System YARN: data operating system Governance Security Operations Resource management Data access: batch, interactive, real-time Storage Commodity Appliance Cloud Built on a centralized architecture of shared enterprise services: Scalable tiered storage Resource and workload management Trusted data governance & metadata management Consistent operations Comprehensive security Developer APIs and tools Hadoop/YARN-powered data operating system 100% open source, multi-tenant data platform for any application, any data set, anywhere.
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Elegant Developer APIs DataFrames, Machine Learning, and SQL Made for Data Science All apps need to get predictive at scale and fine granularity Democratize Machine Learning Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop Community Broad developer, customer and partner interest Realize Value of Data Operating System A key tool in the Hadoop toolbox Why We Love Spark at Hortonworks YARN Scala Java Python R APIs Spark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Let’s Talk Real-World Use-Cases
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DATA FLOW MANAGEMENT Streaming & Machine Learning for Web Analytics USE CASE: Cost: Storage and processing not economical at scale Silos: Separate clusters — Hadoop & Spark Analytics: Retroactive view limited predictive capabilities CHALLENGE Spark Streaming for ingesting events in real- time SparkML for predictive analytics SOLUTION Cost: Hardware costs reduced 25-50% YARN: Shared cluster for Spark, MapReduce, Hive etc. Analytics: Processes 10 billion events daily at 20 milliseconds per event IMPACT HDP
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DATA FLOW MANAGEMENT Claims Re-imbursement Processing with Spark USE CASE: INSURANCE Overwhelmed by data ingest rates Team expertise in R Lots of key features like textual features not incorporated CHALLENGE Use Spark to optimize claims reimbursements process Leverage Spark’s machine learning capabilities to process and analyze all claims. SOLUTION Insurance companies can now process all claims and detect over and under payment More accurate over and under payment detection IMPACT HDP
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved On-going Customer Use Cases with Spark HealthCare Entity Resolution • Different feeds represent entities in different forms – need to identify same entities • No standard packages available to do the job. • Additional features need to be derived to expand context and disambiguate Requirement • Extensible entity resolution framework – mix supervised and unsupervised learning John Smith J. Smith A B E DC H GF name name location location =
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved On-going Customer Use Cases with Spark Finance Efficiently bring HBase Data into Spark • HBase is the operational store of record • End user Analytics via Spark Requirement: • Efficient scans via predicate pushdown via co-processors ° ° ° ° ° ° ° ° °HDFS YARN
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark + Hadoop - The Road Ahead Innovate at the Core Seamless Data Access Data Science Acceleration
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Sources Data Science Acceleration Data Science ToolsNotebook Data Science Libraries Apache Hadoop, Hive and HBase InnovateattheCore Apache Spark Innovate at the Core Storage – RDD Sharing with HDFS Memory Tier Resource Management – Spark on YARN improvements Security – Enhance SparkSQL Security – Enhance Wire Encryption Governance – Integration with Atlas Operations – Ambari to Support Multiple Spark Versions Zeppelin Governance Security Operations APIs Spark Core EngineSpark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° NHDFS YARN SQL Hive NoSQL Hbase Enterprise Ready Seamless DataAccess Spark + Hadoop - The Road Ahead ORC FileNiFiHBase Custom
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Seamless Data Access Data Access – Hive ORC – Hbase Connector – NiFi Streams Spark + Hadoop - The Road Ahead Data Sources Data Science Acceleration Data Science ToolsNotebook Data Science Libraries Apache Hadoop, Hive and HBase InnovateattheCore Apache Spark Zeppelin Governance Security Operations APIs Spark Core EngineSpark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° NHDFS YARN SQL Hive NoSQL Hbase Enterprise Ready Seamless DataAccess ORC FileNiFiHBase Custom
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Science Acceleration Apache Zeppelin – Spark development and visualization Data Science Libraries – Jump start tools leveraging Spark Machine Learning (ie Magellan, Entity Resolution) Core Machine Learning – One vs Rest classifiers, multi-class classifiers – Serializing Machine Learning pipelines – Data set loader for public data sets Spark + Hadoop - The Road Ahead Data Sources Data Science Acceleration Data Science ToolsNotebook Data Science Libraries Apache Hadoop, Hive and HBase InnovateattheCore Apache Spark Zeppelin Governance Security Operations APIs Spark Core EngineSpark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° NHDFS YARN SQL Hive NoSQL Hbase Enterprise Ready Seamless DataAccess ORC FileNiFiHBase Custom
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introducing Apache Zeppelin Open Web-based Notebook for interactive analytics Features Ad-hoc experimentation Deeply integrated with Spark + Hadoop Supports multiple language backends Incubating at Apache Use Case Data exploration and discovery Visualization Interactive snippet-at-a-time experience “Modern Data Science Studio”
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Geospatial Insight Where do people go on weekends? Does usage pattern change with time? Predict the drop off point of a user? Predict the location where next pick up can be expected? Identify crime hotspots How do these hotspots evolve with time? Predict the likelihood of crime occurring at a given neighborhood Predict climate at fairly granular level Climate insurance: do I need to buy insurance for my crops? Climate as a factor in crime: Join climate dataset with Crimes
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Magellan on Spark Packages What – Brings GeoSpatial Analytics to Big Data powered by Spark – Available at Spark Packages http://spark-packages.org/package/harsha2010/magellan Key Features – Parse geospatial data and metadata into Shapes + Metadata – Python and Scala support – Efficient Geometric Queries - ESRI Hive Library - simple and intuitive syntax – Scalable implementations of common algorithms Learn More – Magellan Blog: http://tinyurl.com/magellanBlog Find Sample Magellan Notebook: http://tinyurl.com/zeppelinNotebooks
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark + Hadoop - The Road Ahead Innovate at the Core Seamless Data Access Data Science Acceleration
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Magellan Walkthrough
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Maximizing Revenue for UBER drivers Data Insight Opportunity – Uber publishes anonymized GeoSpatial trip data – City of San Francisco has an active Open Data program - demographics, neighborhoods, traffic Challenges – What neighborhood should a driver hangout to maximize revenue? Solution – Leverage Spark to transform and aggregate data – Use Magellan to do GeoSpatial Queries

Notas do Editor

  1. NEED SPEAKER NOTES
  2. Ensuring Spark is well integrated with YARN, Ambari, and Ranger enables enterprise to deploy Spark apps with confidence, and since HDP is available across Windows, Linux, on-premises and cloud deployment environments, it just makes it that much easier for enterprises to adopt it.
  3. TALK TRACK MACHINE LEARNING WITH SPARK WEBTRENDS PROVIDES DIGITAL MARKETING ANALYTICS STATS + PROCESSES 10 BILLION EVENTS PER DAY + AT AN AVERAGE SPEED OF 20 MILLISECONDS PER EVENT [Back ground] Listen to Peter Crossley Hadoop Summit keynote(min 22:22): http://brightcove.fora.tv/services/player/bcpid4287593805001?bckey=AQ~~,AAACbMgRlRk~,KnD13XNmCDZZPWkNmxmPMFTH2h0USbHh&bclid=4287661488001&bctid=4285724101001 Read recent article on Web Trends by Peter Crossley: http://insidebigdata.com/2015/07/14/strategic-big-data-pivot-leads-webtrends-to-success/
  4. Overwhelmed by data ingest rates, reduce sample data to fit into edge node Random Forest models in R Team expertise in R and not Scala/ Java Lots of key features like textual features not incorporated (cannot handle feature blowup in R)
  5. HBase has efficient scans, however can Spark leverage it? Push predicates and prune columns
  6. HBase has efficient scans, however can Spark leverage it? Push predicates and prune columns
  7. Allow for RDD sharing with the HDFS Memory Tier. Improve dynamic resource allocation via YARN. Mature SparkSQL and Spark Streaming to GA quality. HDFS Memory Tier There are many use cases where Spark today feels less than ideal. For example, using Spark in a shared environment, with a middle tier fielding request from multiple tier is a common problem. SparkContext is a heavyweight object and is tied to a specific user session. Using Spark in shared environment requires features such as RDD sharing with HDFS memory tier and SparkContext sharing. There are other areas of improvement in Spark’s YARN integration. For example, today YARN applications logs are published when the application finishes running. This model does not work for Spark Streaming, which is a long running application and doesn’t get a chance to publish its logs. Spark’s YARN ATS integration also needs work to help it scale and not become a bottleneck. SparkSQL is another critical area where we want to add more value by bringing Hive level features such as security (SPARK-11265), ACID and vectorization features to SparkSQL and make it GA in our platform over the coming months.
  8. Seamless Data Access One Hive Seamless use of capabilities across Spark and Hive via SQL including common file formats. Deliver connectors for HBase (HFile). Spark is a great data processing engine, and it provides more value when it can process more data. The value of data lake is that it brings more data under one roof and opens new opportunity for insights and to drive efficiency. The data lake is delivered by YARN and Hadoop to provide massive scale and run all types of workload to take advantage of that data. With the DataSource API, Spark provides a first class way to bring data from external sources while leveraging these systems for filtering, predicate pushdown etc. We used DataSource API to bring ORC data into Spark. And now we are working to bring HBase data efficiently into processing with Spark. This will be different from existing Spark + HBase connector in that it will leverage HBase for predicate pushdown, column filtering and will be more efficient. Look for a tech preview of this feature in the coming months.
  9. Data science notebooks and automation for the most common analysis scenarios. Zeppelin Include support for GeoSpatial and Entity Resolution. Magellan
  10. TALK TRACK Ad-hoc experimentation Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc Deeply integrated with Spark + Hadoop Can be managed via Ambari Stacks Supports multiple language backends Pluggable “Interpreters” Incubating at Apache 100% open source and open community [NEXT SLIDE]
  11. Hive ESRI Hive (thin wrapper on ESRI Java) Magellan available on Github (https://github.com/harsha2010/magellan) Can parse and understand most widely used formats GeoJSON, ESRIShapefile All geometries supported 1.0.3 released (http://spark-packages.org/package/harsha2010/magellan) Broadcast join available for common scenarios Work in progress (targeted 1.0.4) Geohash Join optimization Map Matching algorithm using Markov Models Python and Scala support Please give it a try and give us feedback!
  12. One of our Insurance customers is using Spark to optimize the claims reimbursements and are using Spark’s machine learning capabilities to process and analyze all claims.
  13. We have seen rapid adoption of Spark in our customer base and we want to thank our customers for choosing Spark on HDP. We also want to thank our partners Microsoft, Databricks, HP, NFLabs and the community on sharing this journey with us.