SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Spark + AI Summit Europe, London, 2018-10-03
Jan-Philipp Simen
Data Scientist
Carl Zeiss AG
Structured Streaming on Azure Databricks for
Predictive Maintenance of Coordinate Measuring Machines
#SAISEnt1
Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 2
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary
ZEISS Camera Lenses
Three Technical Oscars
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 3
ZEISS Research & Quality Technology
More Than 20 Nobel Prizes Enabled by ZEISS Microscopes
https://www.zeiss.com/microscopy/int/about-us/nobel-prize-winners.html
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 4
ZEISS Semiconductor Manufacturing Technology
With Lithography at 13.5 Nanometers Wavelength,
ZEISS Is Enabling the Digital World
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 5
More on Extreme Ultra Violet Lithography: https://www.youtube.com/watch?v=Hfsp2jIjDpI
ZEISS Industrial Metrology
Precision up to 0.3 µm
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 6
Quality Assurance
Metrology
Coordinate Measuring Machines
Usually comparing CAD model to
produced workpiece
tactile
x-ray
optical
laser
Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 7
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary
How Will Predictive Maintenance Benefit Our Customers?
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 8
Main goals:
1. avoid downtime
2. ensure reliable
measurements
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 9
Delivering Customer Value from the Start
Start small, learn from data and
develop first value-adding services
One-click support request2
Predictive maintenance notifications in app4
Predictive models for internal use at ZEISS3
Condition monitoring app1
Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 10
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary
Streaming Architecture on Azure
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 11
Spark
Structured
Streaming
Azure
IoT Hub
…
Azure
Event HubsAzure Event Hubs
Connector for
Apache Spark
Azure
CosmosDB
Azure Cosmos
DB Connector
for Apache
Spark
Data Lake
Parquet
Files
{JSON}
In a Streaming Pipeline the Drawbacks of Untyped SQL
Operations Are Even More Apparent
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 12
def readSensor9Untyped(streamingDF: DataFrame): DataFrame =
streamingDF.select(
$"machineID",
$"dateTime",
udfParseDouble(”sensor9")($"jsonString"))
def udfParseDouble(key: String) = ???
What is in
there,
again?
Did I spell
this right?
Don’t forget to
document the
schema!
Our solution: Spark Datasets and Scala Case Classes
Spark Datasets + Scala Case Classes =
Typesafe Data Processing
def jsonToRecord(sensorID: Int)(event: Event): Record[Option[Double]] =
Record[Option[Double]](event.machineID,
event.dateTime,
parseSensorValue(sensorID, event.jsonString))
def parseSensorValue = ???
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 13
case class Record[T](machineID: String,
timestamp: Timestamp,
value: T)
case class Event(machineID: String,
dateTime: Timestamp
jsonString: String)
def readSensor9(streamingDS: Dataset[Event]): Dataset[Record[Option[Double]]] =
streamingDS.map(jsonToRecord(9))
Spark features wish list
• Typesafe joins for Datasets
• Import / export case classes from / to JSON Schema would allow centralized schema repository
Leveraging the full
potential of a good IDE:
• Auto completion
• Compile checks
• Quick fixes
• Refactoring
Implementing Alert Logic with Stateful Streaming
/** Checks a stream of JSON messages for threshold violations.
*
* @param jsonStream Dataset containing JSON messages as received from IoT Hub
* @param cmms Dataset with thresholds and other specifications for each machine
* @return Dataset with alert events as expected by the Eventhub
*/
def createAlerts(jsonStream: Dataset[JsonMessage],
cmms: Dataset[CmmSpecs]): Dataset[AlertEvent] = {
val joined: Dataset[JsonPlusCmmSpecs] = joinStaticOnStream(jsonStream, cmms)
// Group by serial number
val grouped = joined.groupByKey(_.machineID)
// Evaluate incoming messages in each group (that is for each individual machine)
grouped.flatMapGroupsWithState[AlertState, AlertEvent](...)(evaluateTemperatureState)
}
def evaluateTemperatureState = ???
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 14
Automated Build and Deployment
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 15
Code Repository
1
Build Agent
2
Azure Databricks Platform
3
Scala SBT project
shared code base
Scala notebooks
job specific logic
Shell scripts
job configurations
Build jar artifact
using SBT
assembly
DBFSUpload
Work-
space
Upload
DatabricksRESTAPI
Execute “create job”
requests
Job 1
Job 2
Job 3
Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 16
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary
Batch ETL Architecture
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 17
Azure
BlobManual upload
by service
technicians 50’000 zip files
30 to 200 MB per file
Spark
RDDs and
Datasets
Data Lake
Parquet
files
Winzip
files
Unzipping Lessons Learned
In Theory
val allFilesOnBlob: RDD[(String, PortableDataStream)] = sc.binaryFiles(path)
def unzip(zipFile: PortableDataStream): Map[String, String] = easyUnzipFunction(zipFile)
val unzippedFiles: RDD[(String, Map[String, String])] = allFilesOnBlob.mapValues(unzip)
In Practice
• 250 lines of code only for the unzipping (Winzip format)!
• A lot of exception handling during unzipping.
• Many debugging iterations.
• Size of zip files ranging from 30 MB to 200 MB (compressed) ➡ Had to reserve a lot of memory overhead
for big zip files ➡ Split RDD into three buckets, depending on file size
Next time:
Use gzip compression on file level instead of putting varying amounts of files into a Winzip archive.
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 18
Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 19
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary
20
Historic
log files
Azure
Blob
Data Lake
RDDs and
Datasets
Combining Batch and Streaming Data Sources and
Adding Scheduled Machine Learning Notebooks
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG
MLlib
Training
historic logs,
streaming
events and
service data
trained models
MLlib
Predictiontrained models +
recent streaming
events
Prediction
Dashboard
Structured
Streaming
Azure
IoT Hub
…
Azure
Event Hubs
Azure
CosmosDB
{JSON} ∞
ESB
CRM and
service data
Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 21
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary
Things We Like
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 22
Things we like about Spark
q Datasets: Typesafe data processing
q Stateful Streaming: Powerful transformations of streaming datasets
q MLlib: State-of-the-art distributed algorithms
q All the other great things: Scalability, performance, …
Things we like about Azure Databricks
q Cluster management: Convenient setup, auto-scaling, auto-shutdown
q Job management: Convenient creation and scheduling of Spark jobs
q REST API: Enables automated deployment
Wish List
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 23
Features we would love to see in the future
q Datasets: Typesafe joins
q Central schema repository with export to different formats (Scala case class, JSON schema)
q Better monitoring for multiple, long-running jobs: Integration with external log aggregators or
customizable log metrics in Databricks
Contact
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 24
Dr. Jan-Philipp Simen
Data Scientist
Digital Innovation Partners
Carl Zeiss AG
Kistlerhofstraße 70
81379 Munich
jan-philipp.simen@zeiss.com
https://www.linkedin.com/in/jpsimen
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 25

Mais conteúdo relacionado

Mais procurados

Data cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsData cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsMark Kromer
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in ProductionNathan Bijnens
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsAmazon Web Services
 
New York Elastic{ON} Tour Opening Keynote
New York Elastic{ON} Tour Opening KeynoteNew York Elastic{ON} Tour Opening Keynote
New York Elastic{ON} Tour Opening KeynoteElasticsearch
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
Logging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesLogging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesElasticsearch
 
Activeeon use cases for cloud, digital transformation, IoT and big data autom...
Activeeon use cases for cloud, digital transformation, IoT and big data autom...Activeeon use cases for cloud, digital transformation, IoT and big data autom...
Activeeon use cases for cloud, digital transformation, IoT and big data autom...Activeeon
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개r-kor
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_awsHirokuni Uchida
 
Real-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesReal-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesDatabricks
 
Discurso de apertura
Discurso de aperturaDiscurso de apertura
Discurso de aperturaElasticsearch
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationAlluxio, Inc.
 
Earth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIEarth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIAmazon Web Services
 
Resistance is futile, resilience is crucial
Resistance is futile, resilience is crucialResistance is futile, resilience is crucial
Resistance is futile, resilience is crucialHristo Iliev
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAmazon Web Services
 
Collecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDCollecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDitnig
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIDatabricks
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 

Mais procurados (20)

Data cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsData cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flows
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
 
New York Elastic{ON} Tour Opening Keynote
New York Elastic{ON} Tour Opening KeynoteNew York Elastic{ON} Tour Opening Keynote
New York Elastic{ON} Tour Opening Keynote
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Logging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesLogging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussies
 
Activeeon use cases for cloud, digital transformation, IoT and big data autom...
Activeeon use cases for cloud, digital transformation, IoT and big data autom...Activeeon use cases for cloud, digital transformation, IoT and big data autom...
Activeeon use cases for cloud, digital transformation, IoT and big data autom...
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws
 
Real-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on KubernetesReal-Time Health Score Application using Apache Spark on Kubernetes
Real-Time Health Score Application using Apache Spark on Kubernetes
 
Opening Keynote
Opening KeynoteOpening Keynote
Opening Keynote
 
Discurso de apertura
Discurso de aperturaDiscurso de apertura
Discurso de apertura
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
Earth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIEarth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVI
 
Resistance is futile, resilience is crucial
Resistance is futile, resilience is crucialResistance is futile, resilience is crucial
Resistance is futile, resilience is crucial
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloud
 
Collecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDCollecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsD
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AI
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 

Semelhante a Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen

Data relay introduction to big data clusters
Data relay introduction to big data clustersData relay introduction to big data clusters
Data relay introduction to big data clustersChris Adkin
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...HostedbyConfluent
 
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanVMware Tanzu
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
Azure AI Conference Report
Azure AI Conference ReportAzure AI Conference Report
Azure AI Conference ReportOsamu Masutani
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john malloryAmazon Web Services
 

Semelhante a Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen (20)

Data relay introduction to big data clusters
Data relay introduction to big data clustersData relay introduction to big data clusters
Data relay introduction to big data clusters
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
 
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Azure AI Conference Report
Azure AI Conference ReportAzure AI Conference Report
Azure AI Conference Report
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen

  • 1. Spark + AI Summit Europe, London, 2018-10-03 Jan-Philipp Simen Data Scientist Carl Zeiss AG Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machines #SAISEnt1
  • 2. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 2 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary
  • 3. ZEISS Camera Lenses Three Technical Oscars #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 3
  • 4. ZEISS Research & Quality Technology More Than 20 Nobel Prizes Enabled by ZEISS Microscopes https://www.zeiss.com/microscopy/int/about-us/nobel-prize-winners.html #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 4
  • 5. ZEISS Semiconductor Manufacturing Technology With Lithography at 13.5 Nanometers Wavelength, ZEISS Is Enabling the Digital World #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 5 More on Extreme Ultra Violet Lithography: https://www.youtube.com/watch?v=Hfsp2jIjDpI
  • 6. ZEISS Industrial Metrology Precision up to 0.3 µm #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 6 Quality Assurance Metrology Coordinate Measuring Machines Usually comparing CAD model to produced workpiece tactile x-ray optical laser
  • 7. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 7 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary
  • 8. How Will Predictive Maintenance Benefit Our Customers? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 8 Main goals: 1. avoid downtime 2. ensure reliable measurements
  • 9. #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 9 Delivering Customer Value from the Start Start small, learn from data and develop first value-adding services One-click support request2 Predictive maintenance notifications in app4 Predictive models for internal use at ZEISS3 Condition monitoring app1
  • 10. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 10 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary
  • 11. Streaming Architecture on Azure #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 11 Spark Structured Streaming Azure IoT Hub … Azure Event HubsAzure Event Hubs Connector for Apache Spark Azure CosmosDB Azure Cosmos DB Connector for Apache Spark Data Lake Parquet Files {JSON}
  • 12. In a Streaming Pipeline the Drawbacks of Untyped SQL Operations Are Even More Apparent #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 12 def readSensor9Untyped(streamingDF: DataFrame): DataFrame = streamingDF.select( $"machineID", $"dateTime", udfParseDouble(”sensor9")($"jsonString")) def udfParseDouble(key: String) = ??? What is in there, again? Did I spell this right? Don’t forget to document the schema! Our solution: Spark Datasets and Scala Case Classes
  • 13. Spark Datasets + Scala Case Classes = Typesafe Data Processing def jsonToRecord(sensorID: Int)(event: Event): Record[Option[Double]] = Record[Option[Double]](event.machineID, event.dateTime, parseSensorValue(sensorID, event.jsonString)) def parseSensorValue = ??? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 13 case class Record[T](machineID: String, timestamp: Timestamp, value: T) case class Event(machineID: String, dateTime: Timestamp jsonString: String) def readSensor9(streamingDS: Dataset[Event]): Dataset[Record[Option[Double]]] = streamingDS.map(jsonToRecord(9)) Spark features wish list • Typesafe joins for Datasets • Import / export case classes from / to JSON Schema would allow centralized schema repository Leveraging the full potential of a good IDE: • Auto completion • Compile checks • Quick fixes • Refactoring
  • 14. Implementing Alert Logic with Stateful Streaming /** Checks a stream of JSON messages for threshold violations. * * @param jsonStream Dataset containing JSON messages as received from IoT Hub * @param cmms Dataset with thresholds and other specifications for each machine * @return Dataset with alert events as expected by the Eventhub */ def createAlerts(jsonStream: Dataset[JsonMessage], cmms: Dataset[CmmSpecs]): Dataset[AlertEvent] = { val joined: Dataset[JsonPlusCmmSpecs] = joinStaticOnStream(jsonStream, cmms) // Group by serial number val grouped = joined.groupByKey(_.machineID) // Evaluate incoming messages in each group (that is for each individual machine) grouped.flatMapGroupsWithState[AlertState, AlertEvent](...)(evaluateTemperatureState) } def evaluateTemperatureState = ??? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 14
  • 15. Automated Build and Deployment #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 15 Code Repository 1 Build Agent 2 Azure Databricks Platform 3 Scala SBT project shared code base Scala notebooks job specific logic Shell scripts job configurations Build jar artifact using SBT assembly DBFSUpload Work- space Upload DatabricksRESTAPI Execute “create job” requests Job 1 Job 2 Job 3
  • 16. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 16 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary
  • 17. Batch ETL Architecture #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 17 Azure BlobManual upload by service technicians 50’000 zip files 30 to 200 MB per file Spark RDDs and Datasets Data Lake Parquet files Winzip files
  • 18. Unzipping Lessons Learned In Theory val allFilesOnBlob: RDD[(String, PortableDataStream)] = sc.binaryFiles(path) def unzip(zipFile: PortableDataStream): Map[String, String] = easyUnzipFunction(zipFile) val unzippedFiles: RDD[(String, Map[String, String])] = allFilesOnBlob.mapValues(unzip) In Practice • 250 lines of code only for the unzipping (Winzip format)! • A lot of exception handling during unzipping. • Many debugging iterations. • Size of zip files ranging from 30 MB to 200 MB (compressed) ➡ Had to reserve a lot of memory overhead for big zip files ➡ Split RDD into three buckets, depending on file size Next time: Use gzip compression on file level instead of putting varying amounts of files into a Winzip archive. #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 18
  • 19. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 19 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary
  • 20. 20 Historic log files Azure Blob Data Lake RDDs and Datasets Combining Batch and Streaming Data Sources and Adding Scheduled Machine Learning Notebooks #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG MLlib Training historic logs, streaming events and service data trained models MLlib Predictiontrained models + recent streaming events Prediction Dashboard Structured Streaming Azure IoT Hub … Azure Event Hubs Azure CosmosDB {JSON} ∞ ESB CRM and service data
  • 21. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 21 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary
  • 22. Things We Like #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 22 Things we like about Spark q Datasets: Typesafe data processing q Stateful Streaming: Powerful transformations of streaming datasets q MLlib: State-of-the-art distributed algorithms q All the other great things: Scalability, performance, … Things we like about Azure Databricks q Cluster management: Convenient setup, auto-scaling, auto-shutdown q Job management: Convenient creation and scheduling of Spark jobs q REST API: Enables automated deployment
  • 23. Wish List #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 23 Features we would love to see in the future q Datasets: Typesafe joins q Central schema repository with export to different formats (Scala case class, JSON schema) q Better monitoring for multiple, long-running jobs: Integration with external log aggregators or customizable log metrics in Databricks
  • 24. Contact #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 24 Dr. Jan-Philipp Simen Data Scientist Digital Innovation Partners Carl Zeiss AG Kistlerhofstraße 70 81379 Munich jan-philipp.simen@zeiss.com https://www.linkedin.com/in/jpsimen
  • 25. #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 25