Enviar pesquisa
Carregar
Data Science Crash Course Hadoop Summit SJ
•
2 gostaram
•
606 visualizações
D
Daniel Madrigal
Seguir
Robert Hryniewicz
Leia menos
Leia mais
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 52
Baixar agora
Baixar para ler offline
Recomendados
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Introduction to Hadoop
Introduction to Hadoop
Timothy Spann
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
Recomendados
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Introduction to Hadoop
Introduction to Hadoop
Timothy Spann
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Timothy Spann
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Timothy Spann
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
The Elephant in the Clouds
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
YARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
State of the Union with Shaun Connolly
State of the Union with Shaun Connolly
Hortonworks
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
Hortonworks
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
Mais conteúdo relacionado
Mais procurados
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Timothy Spann
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Timothy Spann
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
The Elephant in the Clouds
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
YARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
State of the Union with Shaun Connolly
State of the Union with Shaun Connolly
Hortonworks
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
Hortonworks
Mais procurados
(20)
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Apache Hadoop Crash Course
Apache Hadoop Crash Course
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
The Elephant in the Clouds
The Elephant in the Clouds
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
YARN - Past, Present, & Future
YARN - Past, Present, & Future
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
State of the Union with Shaun Connolly
State of the Union with Shaun Connolly
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
Destaque
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
lin bao
図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
Kai Sasaki
HDFS Deep Dive
HDFS Deep Dive
Yifeng Jiang
Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015
IMC Institute
Multi User Data science with Zeppelin
Multi User Data science with Zeppelin
Vinay Shukla
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Zhe Zhang
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
Sangjin Lee
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Yahoo!デベロッパーネットワーク
Destaque
(20)
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
HDFS Deep Dive
HDFS Deep Dive
Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015
Multi User Data science with Zeppelin
Multi User Data science with Zeppelin
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark
Hadoop crashcourse v3
Hadoop crashcourse v3
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
What's new in hadoop 3.0
What's new in hadoop 3.0
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Semelhante a Data Science Crash Course Hadoop Summit SJ
Big Data Workshop: Splunk and Dell EMC...Better Together
Big Data Workshop: Splunk and Dell EMC...Better Together
Zivaro Inc
Machine Learning for Startups without PhDs
Machine Learning for Startups without PhDs
Lex Toumbourou
Machine Learning for Startups without PhDs
Machine Learning for Startups without PhDs
Scrunch
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
Mark Rittman
Biwa summit 2015 oaa oracle data miner hands on lab
Biwa summit 2015 oaa oracle data miner hands on lab
Charlie Berger
Machine Learning for Data Extraction
Machine Learning for Data Extraction
Dasha Herrmannova
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Matt Stubbs
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
Rittman Analytics
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
[db tech showcase Tokyo 2018] #dbts2018 #B27 『Discover Machine Learning and A...
[db tech showcase Tokyo 2018] #dbts2018 #B27 『Discover Machine Learning and A...
Insight Technology, Inc.
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
DevOpsDays Amsterdam 2016 workshop
DevOpsDays Amsterdam 2016 workshop
Arnold Van Wijnbergen
DevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
Data science workshop
Data science workshop
Hortonworks
Agile data science
Agile data science
Joel Horwitz
Ncku csie talk about Spark
Ncku csie talk about Spark
Giivee The
Maintainable Machine Learning Products
Maintainable Machine Learning Products
Andrew Musselman
Semelhante a Data Science Crash Course Hadoop Summit SJ
(20)
Big Data Workshop: Splunk and Dell EMC...Better Together
Big Data Workshop: Splunk and Dell EMC...Better Together
Machine Learning for Startups without PhDs
Machine Learning for Startups without PhDs
Machine Learning for Startups without PhDs
Machine Learning for Startups without PhDs
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
Biwa summit 2015 oaa oracle data miner hands on lab
Biwa summit 2015 oaa oracle data miner hands on lab
Machine Learning for Data Extraction
Machine Learning for Data Extraction
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
[db tech showcase Tokyo 2018] #dbts2018 #B27 『Discover Machine Learning and A...
[db tech showcase Tokyo 2018] #dbts2018 #B27 『Discover Machine Learning and A...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
DevOpsDays Amsterdam 2016 workshop
DevOpsDays Amsterdam 2016 workshop
DevOps for DataScience
DevOps for DataScience
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Data science workshop
Data science workshop
Agile data science
Agile data science
Ncku csie talk about Spark
Ncku csie talk about Spark
Maintainable Machine Learning Products
Maintainable Machine Learning Products
Último
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
The Digital Insurer
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
MadyBayot
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
apidays
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Orbitshub
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Jago de Vreede
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Angeliki Cooney
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Overkill Security
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Último
(20)
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Architecting Cloud Native Applications
Architecting Cloud Native Applications
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Data Science Crash Course Hadoop Summit SJ
1.
Robert Hryniewicz Data Evangelist @RobHryniewicz Hands-on Intro to Data Science with Apache Spark Crash Course
2.
2 © Hortonworks Inc. 2011 –2016. All Rights Reserved Plan for
Today • Data Science & ML • ML Examples • Overview of ML methods • K-means, Decision Trees & Random Forests • Spark MLlib & ML • Lab Overview
3.
3 © Hortonworks Inc. 2011 –2016. All Rights Reserved Data Science Examples
4.
4 © Hortonworks Inc. 2011 –2016. All Rights Reserved
5.
5 © Hortonworks Inc. 2011 –2016. All Rights Reserved Predictive Analytics
Pre-requisites Sales Play 4: Predictive Analytics
6.
6 © Hortonworks Inc. 2011 –2016. All Rights Reserved Predictive Analytics
Process and Tools
7.
7 © Hortonworks Inc. 2011 –2016. All Rights Reserved Machine Learning “… science
of how computers learn without being explicitly programmed” – Andrew Ng
8.
8 © Hortonworks Inc. 2011 –2016. All Rights Reserved Machine Learning Methods
9.
9 © Hortonworks Inc. 2011 –2016. All Rights Reserved Supervised vs Unsupervised Learning Examples labeled. Examples not labeled.
10.
10 © Hortonworks Inc. 2011 –2016. All Rights Reserved Unsupervised LearningSupervised Learning
11.
11 © Hortonworks Inc. 2011 –2016. All Rights Reserved CLASSIFICATION Identifying to which category an object belongs to. Applications: spam detection, image recognition, ... Algorithms: k-nn, decision trees, random forest, ...
12.
12 © Hortonworks Inc. 2011 –2016. All Rights Reserved REGRESSION Predicting a continuous-valued attribute associated with an object. Applications: drug response, stock prices, … Algorithms: linear regression, …
13.
13 © Hortonworks Inc. 2011 –2016. All Rights Reserved CLUSTERING Automatic grouping of similar objects into sets. Applications: customer segmentation, topic modeling, … Algorithms: k-means, LDA, …
14.
14 © Hortonworks Inc. 2011 –2016. All Rights Reserved COLLABORATIVE FILTERING Fill in the missing entries of a user-item association matrix. Applications: Product recommendation, … Algorithms: Alternating
Least Squares (ALS)
15.
15 © Hortonworks Inc. 2011 –2016. All Rights Reserved DIMENSIONALITY REDUCTION Reducing the number of random variables to consider. Applications: visualization, increased efficiency, … Algorithms: PCA, t-SNE, …
16.
16 © Hortonworks Inc. 2011 –2016. All Rights Reserved PREPROCESSING Feature extraction and normalization Applications: transforming input data such as text as input to ML algorithms Algorithms: TF-IDF, word2vec, one hot encoding, …
17.
17 © Hortonworks Inc. 2011 –2016. All Rights Reserved MODEL SELECTION Comparing, validating and choosing parameters and models. Applications: improved accuracy via parameter tuning Algorithms: grid search, metrics …
18.
18 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark MLlib
19.
19 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Machine Learning Library à Clustering –
k-means clustering – latent Dirichlet allocation (LDA) à Dimensionality reduction – singularity value decomposition (SVD) – principal component analysis (PCA) à Feature Extractors & Transformers – word2vec à Basic statistics – summary statistics – hypothesis testing – random number generation à Classification and regression – linear models (SVMs, log & linear regression) – decision trees – ensembles of trees (Random Forests & GBTs) à Collaborative filtering – alternating least squares (ALS)
20.
20 © Hortonworks Inc. 2011 –2016. All Rights Reserved K-Means Clustering (Unsupervised Learning)
21.
21 © Hortonworks Inc. 2011 –2016. All Rights Reserved Why K-Means Ã
Simple & fast algorithm to find clusters à Common technique for anomaly detection à Drawbacks – Doesn't work well with non-circular cluster shape – Number of cluster and initial seed value need to be specified beforehand – Strong sensitivity to outliers and noise – Low capability to pass the local optimum.
22.
22 © Hortonworks Inc. 2011 –2016. All Rights Reserved Initialize Cluster
Centers Randomly pick 3 cluster centers.
23.
23 © Hortonworks Inc. 2011 –2016. All Rights Reserved Assign Each
Point Assign each point to the nearest cluster center.
24.
24 © Hortonworks Inc. 2011 –2016. All Rights Reserved Recompute Cluster
Centers Move each cluster to the mean of each cluster.
25.
25 © Hortonworks Inc. 2011 –2016. All Rights Reserved K-means Clustering
26.
26 © Hortonworks Inc. 2011 –2016. All Rights Reserved San Francisco
27.
27 © Hortonworks Inc. 2011 –2016. All Rights Reserved Outline Each
Neighborhood
28.
28 © Hortonworks Inc. 2011 –2016. All Rights Reserved Folium: choropleth
map
29.
29 © Hortonworks Inc. 2011 –2016. All Rights Reserved SF Neighborhood
Centers Calculated with K-Means
30.
30 © Hortonworks Inc. 2011 –2016. All Rights Reserved Sample Dataset
– K-Means 0.0, 0.0, 0.0 0.1, 0.1, 0.1 0.2, 0.2, 0.2 3.0, 3.0, 3.0 3.1, 3.1, 3.1 3.2, 3.2, 3.2
31.
31 © Hortonworks Inc. 2011 –2016. All Rights Reserved Decision Trees & Random Forests (Supervised Learning)
32.
32 © Hortonworks Inc. 2011 –2016. All Rights Reserved Why Decision Trees? Ã Simple to understand and interpret.
(And explain to executives.) Ã Requires little data preparation. (Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed.) Ã Performs well with large datasets.
33.
33 © Hortonworks Inc. 2011 –2016. All Rights Reserved Visual Intro to Decision Trees à http://www.r2d3.us/visual-intro-to-machine-learning-part-1
34.
34 © Hortonworks Inc. 2011 –2016. All Rights Reserved Random Forest
(Ensemble Model) à Main idea: build an ensemble of simple decision trees à Each tree is simple and less likely to overfit à Classify/predict by voting between all trees
35.
35 © Hortonworks Inc. 2011 –2016. All Rights Reserved Decision Tree vs Random Forest
36.
36 © Hortonworks Inc. 2011 –2016. All Rights Reserved Overcome limitations of a single hypothesis Decision Tree Model Averaging Why Ensembles work?
37.
37 © Hortonworks Inc. 2011 –2016. All Rights Reserved Diabetes Dataset – Decision Trees / Random Forest Labeled set with 8 Features -1
1:-0.294118 2:0.487437 3:0.180328 4:-0.292929 5:-1 6:0.00149028 7:-0.53117 8:-0.0333333 +1 1:-0.882353 2:-0.145729 3:0.0819672 4:-0.414141 5:-1 6:-0.207153 7:-0.766866 8:-0.666667 -1 1:-0.0588235 2:0.839196 3:0.0491803 4:-1 5:-1 6:-0.305514 7:-0.492741 8:-0.633333 +1 1:-0.882353 2:-0.105528 3:0.0819672 4:-0.535354 5:-0.777778 6:-0.162444 7:-0.923997 8:-1 -1 1:-1 2:0.376884 3:-0.344262 4:-0.292929 5:-0.602837 6:0.28465 7:0.887276 8:-0.6 +1 1:-0.411765 2:0.165829 3:0.213115 4:-1 5:-1 6:-0.23696 7:-0.894962 8:-0.7 -1 1:-0.647059 2:-0.21608 3:-0.180328 4:-0.353535 5:-0.791962 6:-0.0760059 7:-0.854825 8:-0.833333 ...
38.
38 © Hortonworks Inc. 2011 –2016. All Rights Reserved Machine Learning in Spark
39.
39 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Ecosystem Spark Core Spark SQL Spark Streaming
MLlib GraphX
40.
40 © Hortonworks Inc. 2011 –2016. All Rights Reserved Machine Learning with Spark (MLlib & ML) Ã
Original “lower” API à Built on top of RDDs à Maintenance mode starting with Spark 2.0 MLlib à Newer “higher-level” API for constructing workflows à Built on top of DataFrames ML Both algorithms implemented to take advantage of data parallelism
41.
41 © Hortonworks Inc. 2011 –2016. All Rights Reserved Predict Model Supervised Learning:
End-to-End Flow Feature Extraction Train the Model ModelData items Labels Data item Feature Extraction Label Training (batch) Predicting (real time or batch) Feature Matrix Feature Vector Training set
42.
42 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark ML:
Spark API for building ML pipelines Feature transform 1 Feature transform 2 Combine features Random Forest Input DataFrame (TRAIN) Input DataFrame (TEST) Output Dataframe (PREDICTIONS) Pipeline Pipeline Model
43.
43 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark ML
Pipeline à Pipeline includes both fit() and transform() methods – fit() is for training – transform() is for prediction Input DataFrame (TRAIN) Input DataFrame (TEST) Output Dataframe (PREDICTIONS) Pipeline Pipeline Model fit() transform() model = pipe.fit(trainData) # Train model results = model.transform(testData) # Test model
44.
44 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark ML
– Simple Random Forest Example indexer = StringIndexer(inputCol=”district", outputCol=”dis-inx") parser = Tokenizer(inputCol=”text-field", outputCol="words") hashingTF = HashingTF(numFeatures=50, inputCol="words", outputCol="hash-inx") vecAssembler = VectorAssembler( inputCols =[“dis-inx”, “hash-inx”], outputCol="features") rf = RandomForestClassifier(numTrees=100, labelCol="label", seed=42) pipe = Pipeline(stages=[indexer, parser, hashingTF, vecAssembler, rf]) model = pipe.fit(trainData) # Train model results = model.transform(testData) # Test model
45.
45 © Hortonworks Inc. 2011 –2016. All Rights Reserved Apache Zeppelin – A Modern Web-based Data Science Studio Ã
Data exploration and discovery à Visualization à Deeply integrated with Spark and Hadoop à Pluggable interpreters à Multiple languages in one notebook: R, Python, Scala
46.
46 © Hortonworks Inc. 2011 –2016. All Rights Reserved
47.
47 © Hortonworks Inc. 2011 –2016. All Rights Reserved Exporting ML
Models - PMML Ã Predictive Model Markup Language (PMML) Ã Supported models – K-Means – Linear Regression – Ridge Regression – Lasso – SVM – Binary
48.
48 © Hortonworks Inc. 2011 –2016. All Rights Reserved Additional Resources •
Machine Learning • Natural Language Processing (NLP) • Scalable Machine Learning • Introduction to Statistics
49.
49 © Hortonworks Inc. 2011 –2016. All Rights Reserved Lab Overview tinyurl.com/hwx-intro-to-ml-with-spark
50.
50 © Hortonworks Inc. 2011 –2016. All Rights Reserved Hortonworks Community Connection Read access
for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
51.
51 © Hortonworks Inc. 2011 –2016. All Rights Reserved Community Engagement community.hortonworks.com © Hortonworks Inc. 2011 –2015. All Rights Reserved 7,500+ Registered Users 15,000+ Answers 20,000+ Technical Assets One Website!
52.
Robert Hryniewicz @RobHryniewicz Thanks!
Baixar agora