Enviar pesquisa
Carregar
Intro to Spark & Zeppelin - Crash Course - HS16SJ
âą
10 gostaram
âą
1,838 visualizaçÔes
DataWorks Summit/Hadoop Summit
Seguir
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Leia menos
Leia mais
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 51
Baixar agora
Baixar para ler offline
Recomendados
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Â
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Â
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Â
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
Â
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Â
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Â
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
Â
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
Â
Recomendados
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Â
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Â
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Â
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
Â
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Â
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Â
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
Â
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
Â
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
Â
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
Â
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Â
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Â
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
Â
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Â
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Â
The Elephant in the Clouds
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
Â
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Hortonworks
Â
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Â
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
Â
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
Â
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
Â
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
Â
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
Â
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
Â
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Â
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Â
Automated Analytics at Scale
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
Â
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
Â
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
Â
Apache spark Intro
Apache spark Intro
Tudor Lapusan
Â
Mais conteĂșdo relacionado
Mais procurados
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
Â
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
Â
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Â
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Â
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
Â
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Â
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Â
The Elephant in the Clouds
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
Â
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Hortonworks
Â
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Â
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
Â
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
Â
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
Â
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
Â
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
Â
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
Â
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Â
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Â
Automated Analytics at Scale
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
Â
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
Â
Mais procurados
(20)
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Â
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
Â
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Â
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Â
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
Â
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
Â
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
Â
The Elephant in the Clouds
The Elephant in the Clouds
Â
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Â
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
Â
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
Â
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Â
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Â
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
Â
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
Â
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Â
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Â
Apache Hadoop Crash Course
Apache Hadoop Crash Course
Â
Automated Analytics at Scale
Automated Analytics at Scale
Â
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Â
Destaque
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
Â
Apache spark Intro
Apache spark Intro
Tudor Lapusan
Â
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Â
Kafkać«ăăăŒăżćŠçăăăŒă NiFiă§æ§çŻăăăăŸăćźæŒăă5ćé
Kafkać«ăăăŒăżćŠçăăăŒă NiFiă§æ§çŻăăăăŸăćźæŒăă5ćé
Koji Kawamura
Â
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
Â
Apache NiFiă§ăæ„œăăŠăă€ăȘăăăćșăă IoTăăăžă§ăŻă
Apache NiFiă§ăæ„œăăŠăă€ăȘăăăćșăă IoTăăăžă§ăŻă
Koji Kawamura
Â
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
Aerospike, Inc.
Â
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
Koji Kawamura
Â
Togaf introduction and core concepts
Togaf introduction and core concepts
Paul Sullivan
Â
Apache Hadoop YARN
Apache Hadoop YARN
Adam Kawa
Â
TOGAF 9 Architectural Artifacts
TOGAF 9 Architectural Artifacts
Maganathin Veeraragaloo
Â
TOGAF Complete Slide Deck
TOGAF Complete Slide Deck
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
Â
Hiveăé«éćăăLLAP
Hiveăé«éćăăLLAP
Yahoo!ăăăăăăŒăăăăŻăŒăŻ
Â
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
Henry Saputra
Â
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
Terence Yim
Â
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
Â
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
DataWorks Summit/Hadoop Summit
Â
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks
Â
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!
Sam Mandebvu
Â
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
Â
Destaque
(20)
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
Â
Apache spark Intro
Apache spark Intro
Â
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
Â
Kafkać«ăăăŒăżćŠçăăăŒă NiFiă§æ§çŻăăăăŸăćźæŒăă5ćé
Kafkać«ăăăŒăżćŠçăăăŒă NiFiă§æ§çŻăăăăŸăćźæŒăă5ćé
Â
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Â
Apache NiFiă§ăæ„œăăŠăă€ăȘăăăćșăă IoTăăăžă§ăŻă
Apache NiFiă§ăæ„œăăŠăă€ăȘăăăćșăă IoTăăăžă§ăŻă
Â
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
Â
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
Â
Togaf introduction and core concepts
Togaf introduction and core concepts
Â
Apache Hadoop YARN
Apache Hadoop YARN
Â
TOGAF 9 Architectural Artifacts
TOGAF 9 Architectural Artifacts
Â
TOGAF Complete Slide Deck
TOGAF Complete Slide Deck
Â
Hiveăé«éćăăLLAP
Hiveăé«éćăăLLAP
Â
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
Â
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
Â
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Â
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
Â
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Â
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!
Â
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Â
Semelhante a Intro to Spark & Zeppelin - Crash Course - HS16SJ
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
Â
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Â
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
IntelÂź Software
Â
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
Â
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Abdelkrim Hadjidj
Â
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
Â
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
Edelweiss Kammermann
Â
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Â
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Â
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
MarketingArrowECS_CZ
Â
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Anand Haridass
Â
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Â
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Frank Munz
Â
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Â
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic ConsultorĂa TecnolĂłgica
Â
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
Connor McDonald
Â
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Neo4j
Â
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Â
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
Â
Data core overview - haluk-final
Data core overview - haluk-final
Haluk Ulubay
Â
Semelhante a Intro to Spark & Zeppelin - Crash Course - HS16SJ
(20)
Apache Spark Crash Course
Apache Spark Crash Course
Â
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Â
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
Â
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Â
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Â
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
Â
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
Â
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Â
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Â
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
Â
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Â
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Â
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Â
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
Â
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Â
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
Â
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Â
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Â
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Â
Data core overview - haluk-final
Data core overview - haluk-final
Â
Mais de DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
Â
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Â
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Â
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Â
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Â
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Â
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Â
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Â
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Â
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Â
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Â
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Â
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Â
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
Â
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
Â
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Â
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Â
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Â
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Â
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
Â
Mais de DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
Â
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Â
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Â
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Â
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Â
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Â
Hadoop Crash Course
Hadoop Crash Course
Â
Data Science Crash Course
Data Science Crash Course
Â
Apache Spark Crash Course
Apache Spark Crash Course
Â
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Â
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Â
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Â
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Â
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
Â
HBase in Practice
HBase in Practice
Â
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Â
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Â
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Â
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Â
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Â
Ăltimo
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Â
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Â
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Â
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Â
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Â
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Â
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Â
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Â
đŹ The future of MySQL is Postgres đ
đŹ The future of MySQL is Postgres đ
RTylerCroy
Â
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Â
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Â
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Â
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Â
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Â
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Â
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel AraĂșjo
Â
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Â
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Â
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Â
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Â
Ăltimo
(20)
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Â
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Â
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Â
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Â
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Â
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Â
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Â
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Â
đŹ The future of MySQL is Postgres đ
đŹ The future of MySQL is Postgres đ
Â
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Â
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Â
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Â
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Â
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Â
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Â
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Â
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Â
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Â
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Â
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Â
Intro to Spark & Zeppelin - Crash Course - HS16SJ
1.
Robert Hryniewicz Data Evangelist @RobHryniewicz Hands-on Intro to Spark & Zeppelin Crash Course
2.
2 © Hortonworks Inc. 2011 â2016. All Rights Reserved The âBig Dataâ Problem Ă A single machine cannot process or even store all the data! Problem Solution Ă
Distribute data over large clusters Difficulty Ă How to split work across machines? Ă Moving data over network is expensive Ă Must consider data & network locality Ă How to deal with failures? Ă How to deal with slow nodes?
3.
3 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Background
4.
4 © Hortonworks Inc. 2011 â2016. All Rights Reserved Access Rates At least an order of magnitude difference between memory and hard drive / network speed FAST slow
slow
5.
5 © Hortonworks Inc. 2011 â2016. All Rights Reserved What is Spark? Ă Apache Open Source Project
- originally developed at AMPLab (University of California Berkeley) Ă Data Processing Engine - focused on in-memory distributed computing use-cases Ă API - Scala, Python, Java and R
6.
6 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Ecosystem Spark Core Spark SQL Spark Streaming
MLLib GraphX
7.
7 © Hortonworks Inc. 2011 â2016. All Rights Reserved Why Spark? Ă Elegant Developer APIs â
Single environment for data munging and Machine Learning (ML) Ă In-memory computation model â Fast! â Effective for iterative computations and ML Ă Machine Learning â Implementation of distributed ML algorithms â Pipeline API (Spark ML)
8.
8 © Hortonworks Inc. 2011 â2016. All Rights Reserved History of Hadoop & Spark
9.
9 © Hortonworks Inc. 2011 â2016. All Rights Reserved Apache Spark Basics
10.
10 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Context Ă Main entry point for Spark functionality Ă
Represents a connection to a Spark cluster Ă Represented as sc in your code What is it?
11.
11 © Hortonworks Inc. 2011 â2016. All Rights Reserved RDD - Resilient Distributed Dataset Ă
Primary abstraction in Spark â An Immutable collection of objects (or records, or elements) that can be operated on in parallel Ă Distributed â Collection of elements partitioned across nodes in a cluster â Each RDD is composed of one or more partitions â User can control the number of partitions â More partitions => more parallelism Ă Resilient â Recover from node failures â An RDD keeps its lineage information -> it can be recreated from parent RDDs Ă Created by starting with a file in Hadoop Distributed File System (HDFS) or an existing collection in the driver program Ă May be persisted in memory for efficient reuse across parallel operations (caching)
12.
12 © Hortonworks Inc. 2011 â2016. All Rights Reserved RDD â Resilient Distributed Dataset Partition 1 Partition 2 Partition 3 RDD 2 Partition 1 Partition 2 Partition 3 Partition 4 RDD 1 Cluster Nodes
13.
13 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark SQL
14.
14 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark SQL Overview Ă Spark module for structured data processing (e.g. DB tables, JSON files) Ă
Three ways to manipulate data: â DataFrames API â SQL queries â Datasets API Ă Same execution engine for all three Ă Spark SQL interfaces provide more information about both structure and computation being performed than basic Spark RDD API
15.
15 © Hortonworks Inc. 2011 â2016. All Rights Reserved DataFrames Ă Conceptually
equivalent to a table in relational DB or data frame in R/Python Ă API available in Scala, Java, Python, and R Ă Richer optimizations (significantly faster than RDDs) Ă Distributed collection of data organized into named columns Ă Underneath is an RDD
16.
16 © Hortonworks Inc. 2011 â2016. All Rights Reserved DataFrames CSVAvro HIVE Spark SQL Text Col1 Col2
⊠⊠ColN DataFrame (with RDD underneath) Column Row Created from Various Sources Ă DataFrames from HIVE: â Reading and writing HIVE tables, including ORC Ă DataFrames from files: â Built-in: JSON, JDBC, ORC, Parquet, HDFS â External plug-in: CSV, HBASE, Avro Ă DataFrames from existing RDDs â with toDF()function Data is described as a DataFrame with rows, columns and a schema
17.
17 © Hortonworks Inc. 2011 â2016. All Rights Reserved SQL Context and Hive Context Ă Entry point into all functionality in Spark SQL Ă
All you need is SparkContext val sqlContext = SQLContext(sc) SQLContext Ă Superset of functionality provided by basic SQLContext â Read data from Hive tables â Access to Hive Functions Ă UDFs HiveContext val hc = HiveContext(sc) Use when your data resides in Hive
18.
18 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark SQL Examples
19.
19 © Hortonworks Inc. 2011 â2016. All Rights Reserved DataFrame Example val
df = sqlContext.table("flightsTbl") df.select("Origin", "Dest", "DepDelay").show(5) Reading Data From Table +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 8| | IAD| TPA| 19| | IND| BWI| 8| | IND| BWI| -4| | IND| BWI| 34| +------+----+--------+
20.
20 © Hortonworks Inc. 2011 â2016. All Rights Reserved DataFrame Example df.select("Origin",
"Dest", "DepDelayâ).filter($"DepDelay" > 15).show(5) Using DataFrame API to Filter Data (show delays more than 15 min) +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+
21.
21 © Hortonworks Inc. 2011 â2016. All Rights Reserved SQL Example // Register
Temporary Table df.registerTempTable("flights") // Use SQL to Query Dataset sqlContext.sql("SELECT Origin, Dest, DepDelay FROM flights WHERE DepDelay > 15 LIMIT 5").show Using SQL to Query and Filter Data (again, show delays more than 15 min) +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+
22.
22 © Hortonworks Inc. 2011 â2016. All Rights Reserved RDD vs. DataFrame
23.
23 © Hortonworks Inc. 2011 â2016. All Rights Reserved RDDs vs. DataFrames RDD DataFrame Ă Lower-level API (more control) Ă
Lots of existing code & users Ă Compile-time type-safety Ă Higher-level API (faster development) Ă Faster sorting, hashing, and serialization Ă More opportunities for automatic optimization Ă Lower memory pressure
24.
24 © Hortonworks Inc. 2011 â2016. All Rights Reserved Data Frames
are Intuitive RDD Example Equivalent Data Frame Example dept name age Bio H Smith 48 CS A Turing 54 Bio B Jones 43 Phys E Witten 61 Find average age by department?
25.
25 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark SQL Optimizations Ă Spark SQL uses an underlying optimization engine (Catalyst) â
Catalyst can perform intelligent optimization since it understands the schema Ă Spark SQL does not materialize all the columns (as with RDD) only whatâs needed
26.
26 © Hortonworks Inc. 2011 â2016. All Rights Reserved Catalyst: Spark SQL optimizer Ă Query or data frame operations modeled as a tree Ă
Logical plan created and optimized Ă Various physical plans created; best plan chosen Ă Code generation and execution
27.
27 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Streaming
28.
28 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Streaming Ă Extension of Spark Core API Ă
Stream processing of live data streams â Scalable â High-throughput â Fault-tolerant Overview
29.
29 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Streaming
30.
30 © Hortonworks Inc. 2011 â2016. All Rights Reserved Spark Streaming Ă Apply transformations over a sliding window of data, e.g. rolling average Window Operations
31.
31 © Hortonworks Inc. 2011 â2016. All Rights Reserved Apache Zeppelin & HDP Sandbox
32.
32 © Hortonworks Inc. 2011 â2016. All Rights Reserved Apache Zeppelin â A Modern Web-based Data Science Studio Ă
Data exploration and discovery Ă Visualization Ă Deeply integrated with Spark and Hadoop Ă Pluggable interpreters Ă Multiple languages in one notebook: R, Python, Scala
33.
33 © Hortonworks Inc. 2011 â2016. All Rights Reserved
34.
34 © Hortonworks Inc. 2011 â2016. All Rights Reserved
35.
35 © Hortonworks Inc. 2011 â2016. All Rights Reserved
36.
36 © Hortonworks Inc. 2011 â2016. All Rights Reserved Whatâs not included with Spark? ResourceManagement Storage Applications Spark Core Engine Scala Java Python libraries MLlib (Machine learning) Spark SQL* Spark Streaming* Spark Core Engine
37.
37 © Hortonworks Inc. 2011 â2016. All Rights Reserved HDP Sandbox Whatâs included in the Sandbox? Ă Zeppelin Ă
Latest Hortonworks Data Platform (HDP) â Spark â YARN Ă Resource Management â HDFS Ă Distributed Storage Layer â And many more components... YARN Scala Java Python R APIs Spark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS
38.
38 © Hortonworks Inc. 2011 â2016. All Rights Reserved Access patterns
enabled by YARN YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °N HDFS Hadoop Distributed File System Interactive Real-TimeBatch Applications Batch Needs to happen but, no timeframe limitations Interactive Needs to happen at Human time Real-Time Needs to happen at Machine Execution time.
39.
39 © Hortonworks Inc. 2011 â2016. All Rights Reserved Why Spark on YARN? Ă Utilize existing HDP cluster infrastructure Ă
Resource management â share Spark workloads with other workloads like PIG, HIVE, etc. Ă Scheduling and queues Spark Driver Client Spark Application Master YARN container Spark Executor YARN container Task Task Spark Executor YARN container Task Task Spark Executor YARN container Task Task
40.
40 © Hortonworks Inc. 2011 â2016. All Rights Reserved Why HDFS? Fault
Tolerant Distributed Storage âą Divide files into big blocks and distribute 3 copies randomlyacross the cluster âą Processing Data Locality âą Not Just storage but computation 10110100101 00100111001 11111001010 01110100101 00101100100 10101001100 01010010111 01011101011 11011011010 10110100101 01001010101 01011100100 11010111010 0 Logical File 1 2 3 4 Blocks 1 Cluster 1 1 2 2 2 3 3 34 4 4
41.
41 © Hortonworks Inc. 2011 â2016. All Rights Reserved Thereâs more
to HDP YARN : Data Operating System DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Data Lifecycle & Governance Falcon Atlas Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS EncryptionData Workflow Sqoop Flume Kafka NFS WebHDFS Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper Scheduling Oozie Batch MapReduce Script Pig Search Solr SQL Hive NoSQL HBase Accumulo Phoenix Stream Storm In-memory Others ISV Engines Tez Tez Slider Slider DATA MANAGEMENT Hortonworks Data Platform 2.4.x Deployment ChoiceLinux Windows On-Premise Cloud HDFS Hadoop Distributed File System
42.
42 © Hortonworks Inc. 2011 â2016. All Rights Reserved HDP 2.5 TP
43.
43 © Hortonworks Inc. 2011 â2016. All Rights Reserved
44.
44 © Hortonworks Inc. 2011 â2016. All Rights Reserved
45.
45 © Hortonworks Inc. 2011 â2016. All Rights Reserved View User Sessions
46.
46 © Hortonworks Inc. 2011 â2016. All Rights Reserved Hortonworks Community Connection
47.
47 © Hortonworks Inc. 2011 â2016. All Rights Reserved Hortonworks Community Connection Read access
for everyone, join to participate and be recognized âą Full Q&A Platform (like StackOverflow) âą Knowledge Base Articles âą Code Samples and Repositories
48.
48 © Hortonworks Inc. 2011 â2016. All Rights Reserved Community Engagement Participate now
at: community.hortonworks.com© Hortonworks Inc. 2011 â2015. All Rights Reserved 7,500+ Registered Users 15,000+ Answers 20,000+ Technical Assets One Website!
49.
49 © Hortonworks Inc. 2011 â2016. All Rights Reserved Lab Preview
50.
50 © Hortonworks Inc. 2011 â2016. All Rights Reserved Link to Tutorial with Lab Instructions http://tinyurl.com/hwx-intro-to-spark
51.
Robert Hryniewicz @RobHryniewicz Thanks!
Baixar agora