Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen

•

4 gostaram•686 visualizações

We use Apache Spark Structured Streaming on the Databricks Unified Analytics Platform to process live data and Spark MLlib to train models for predicting machine failure. Structured Streaming and MLlib combined in the Zeiss Measuring Capability App allows users to stay on top of all relevant machine information and to know at a glance if a machine is capable of performing reliably. We will demonstrate how Azure Databricks allows us to easily schedule and monitor an increasing number of Spark jobs, continuously adding new features to our app.

Dados e análise

Spark + AI Summit Europe, London, 2018-10-03
Jan-Philipp Simen
Data Scientist
Carl Zeiss AG
Structured Streaming on Azure Databricks for
Predictive Maintenance of Coordinate Measuring Machines
#SAISEnt1

Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 2
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary

ZEISS Camera Lenses
Three Technical Oscars
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 3

ZEISS Research & Quality Technology
More Than 20 Nobel Prizes Enabled by ZEISS Microscopes
https://www.zeiss.com/microscopy/int/about-us/nobel-prize-winners.html
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 4

ZEISS Semiconductor Manufacturing Technology
With Lithography at 13.5 Nanometers Wavelength,
ZEISS Is Enabling the Digital World
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 5
More on Extreme Ultra Violet Lithography: https://www.youtube.com/watch?v=Hfsp2jIjDpI

ZEISS Industrial Metrology
Precision up to 0.3 µm
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 6
Quality Assurance
Metrology
Coordinate Measuring Machines
Usually comparing CAD model to
produced workpiece
tactile
x-ray
optical
laser

Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 7
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary

How Will Predictive Maintenance Benefit Our Customers?
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 8
Main goals:
1. avoid downtime
2. ensure reliable
measurements

#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 9
Delivering Customer Value from the Start
Start small, learn from data and
develop first value-adding services
One-click support request2
Predictive maintenance notifications in app4
Predictive models for internal use at ZEISS3
Condition monitoring app1

Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 10
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary

Streaming Architecture on Azure
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 11
Spark
Structured
Streaming
Azure
IoT Hub
…
Azure
Event HubsAzure Event Hubs
Connector for
Apache Spark
Azure
CosmosDB
Azure Cosmos
DB Connector
for Apache
Spark
Data Lake
Parquet
Files
{JSON}

In a Streaming Pipeline the Drawbacks of Untyped SQL
Operations Are Even More Apparent
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 12
def readSensor9Untyped(streamingDF: DataFrame): DataFrame =
streamingDF.select(
$"machineID",
$"dateTime",
udfParseDouble(”sensor9")($"jsonString"))
def udfParseDouble(key: String) = ???
What is in
there,
again?
Did I spell
this right?
Don’t forget to
document the
schema!
Our solution: Spark Datasets and Scala Case Classes

Spark Datasets + Scala Case Classes =
Typesafe Data Processing
def jsonToRecord(sensorID: Int)(event: Event): Record[Option[Double]] =
Record[Option[Double]](event.machineID,
event.dateTime,
parseSensorValue(sensorID, event.jsonString))
def parseSensorValue = ???
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 13
case class Record[T](machineID: String,
timestamp: Timestamp,
value: T)
case class Event(machineID: String,
dateTime: Timestamp
jsonString: String)
def readSensor9(streamingDS: Dataset[Event]): Dataset[Record[Option[Double]]] =
streamingDS.map(jsonToRecord(9))
Spark features wish list
• Typesafe joins for Datasets
• Import / export case classes from / to JSON Schema would allow centralized schema repository
Leveraging the full
potential of a good IDE:
• Auto completion
• Compile checks
• Quick fixes
• Refactoring

$Implementing Alert Logic with Stateful Streaming /** Checks a stream of JSON messages for threshold violations. * * @param jsonStream Dataset containing JSON messages as received from IoT Hub * @param cmms Dataset with thresholds and other specifications for each machine * @return Dataset with alert events as expected by the Eventhub */ def createAlerts(jsonStream: Dataset[JsonMessage], cmms: Dataset[CmmSpecs]): Dataset[AlertEvent] = { val joined: Dataset[JsonPlusCmmSpecs] = joinStaticOnStream(jsonStream, cmms) // Group by serial number val grouped = joined.groupByKey(_.machineID) // Evaluate incoming messages in each group (that is for each individual machine) grouped.flatMapGroupsWithState[AlertState, AlertEvent](...)(evaluateTemperatureState) } def evaluateTemperatureState = ??? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 14$

Automated Build and Deployment
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 15
Code Repository
1
Build Agent
2
Azure Databricks Platform
3
Scala SBT project
shared code base
Scala notebooks
job specific logic
Shell scripts
job configurations
Build jar artifact
using SBT
assembly
DBFSUpload
Work-
space
Upload
DatabricksRESTAPI
Execute “create job”
requests
Job 1
Job 2
Job 3

Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 16
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary

Batch ETL Architecture
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 17
Azure
BlobManual upload
by service
technicians 50’000 zip files
30 to 200 MB per file
Spark
RDDs and
Datasets
Data Lake
Parquet
files
Winzip
files

Unzipping Lessons Learned
In Theory
val allFilesOnBlob: RDD[(String, PortableDataStream)] = sc.binaryFiles(path)
def unzip(zipFile: PortableDataStream): Map[String, String] = easyUnzipFunction(zipFile)
val unzippedFiles: RDD[(String, Map[String, String])] = allFilesOnBlob.mapValues(unzip)
In Practice
• 250 lines of code only for the unzipping (Winzip format)!
• A lot of exception handling during unzipping.
• Many debugging iterations.
• Size of zip files ranging from 30 MB to 200 MB (compressed) ➡ Had to reserve a lot of memory overhead
for big zip files ➡ Split RDD into three buckets, depending on file size
Next time:
Use gzip compression on file level instead of putting varying amounts of files into a Winzip archive.
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 18

Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 19
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary

20
Historic
log files
Azure
Blob
Data Lake
RDDs and
Datasets
Combining Batch and Streaming Data Sources and
Adding Scheduled Machine Learning Notebooks
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG
MLlib
Training
historic logs,
streaming
events and
service data
trained models
MLlib
Predictiontrained models +
recent streaming
events
Prediction
Dashboard
Structured
Streaming
Azure
IoT Hub
…
Azure
Event Hubs
Azure
CosmosDB
{JSON} ∞
ESB
CRM and
service data

Agenda
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 21
1 About ZEISS
2 Motivation for Predictive Maintenance
3 Processing Live Data Streams
4 Processing Historical Data Batches
5 Bringing It Together
6 Summary

Things We Like
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 22
Things we like about Spark
q Datasets: Typesafe data processing
q Stateful Streaming: Powerful transformations of streaming datasets
q MLlib: State-of-the-art distributed algorithms
q All the other great things: Scalability, performance, …
Things we like about Azure Databricks
q Cluster management: Convenient setup, auto-scaling, auto-shutdown
q Job management: Convenient creation and scheduling of Spark jobs
q REST API: Enables automated deployment

Wish List
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 23
Features we would love to see in the future
q Datasets: Typesafe joins
q Central schema repository with export to different formats (Scala case class, JSON schema)
q Better monitoring for multiple, long-running jobs: Integration with external log aggregators or
customizable log metrics in Databricks

Contact
#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 24
Dr. Jan-Philipp Simen
Data Scientist
Digital Innovation Partners
Carl Zeiss AG
Kistlerhofstraße 70
81379 Munich
jan-philipp.simen@zeiss.com
https://www.linkedin.com/in/jpsimen

#SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 25

Recomendados

Data Lake ETL in the Cloud with ADF

Data Lake ETL in the Cloud with ADF

Data Lake ETL in the Cloud with ADFMark Kromer

Logging, Metrics, and APM: The Operations Trifecta (P)

Logging, Metrics, and APM: The Operations Trifecta (P)

Logging, Metrics, and APM: The Operations Trifecta (P)Elasticsearch

Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEon

Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEon

Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEonActiveeon

Architectural Best Practices to Master + Pitfalls to Avoid (P)

Architectural Best Practices to Master + Pitfalls to Avoid (P)

Architectural Best Practices to Master + Pitfalls to Avoid (P) Elasticsearch

Building Open Data Lakes on AWS with Debezium and Apache Hudi

Building Open Data Lakes on AWS with Debezium and Apache Hudi

Building Open Data Lakes on AWS with Debezium and Apache HudiGary Stafford

developmentSEED Presentation for Earth Observation in the Cloud Demo Day

developmentSEED Presentation for Earth Observation in the Cloud Demo Day

developmentSEED Presentation for Earth Observation in the Cloud Demo DayAmazon Web Services

Logging, Metrics, and APM: The Operations Trifecta

Logging, Metrics, and APM: The Operations Trifecta

Logging, Metrics, and APM: The Operations TrifectaElasticsearch

Combinación de logs, métricas y seguimiento para una visibilidad centralizada

Combinación de logs, métricas y seguimiento para una visibilidad centralizada

Combinación de logs, métricas y seguimiento para una visibilidad centralizadaElasticsearch

Recomendados

Data Lake ETL in the Cloud with ADF

Data Lake ETL in the Cloud with ADF

Data Lake ETL in the Cloud with ADFMark Kromer

Logging, Metrics, and APM: The Operations Trifecta (P)

Logging, Metrics, and APM: The Operations Trifecta (P)

Logging, Metrics, and APM: The Operations Trifecta (P)Elasticsearch

Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEon

Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEon

Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEonActiveeon

Architectural Best Practices to Master + Pitfalls to Avoid (P)

Architectural Best Practices to Master + Pitfalls to Avoid (P)

Architectural Best Practices to Master + Pitfalls to Avoid (P) Elasticsearch

Building Open Data Lakes on AWS with Debezium and Apache Hudi

Building Open Data Lakes on AWS with Debezium and Apache Hudi

Building Open Data Lakes on AWS with Debezium and Apache HudiGary Stafford

developmentSEED Presentation for Earth Observation in the Cloud Demo Day

developmentSEED Presentation for Earth Observation in the Cloud Demo Day

developmentSEED Presentation for Earth Observation in the Cloud Demo DayAmazon Web Services

Logging, Metrics, and APM: The Operations Trifecta

Logging, Metrics, and APM: The Operations Trifecta

Logging, Metrics, and APM: The Operations TrifectaElasticsearch

Combinación de logs, métricas y seguimiento para una visibilidad centralizada

Combinación de logs, métricas y seguimiento para una visibilidad centralizada

Combinación de logs, métricas y seguimiento para una visibilidad centralizadaElasticsearch

Data cleansing and data prep with synapse data flows

Data cleansing and data prep with synapse data flows

Data cleansing and data prep with synapse data flowsMark Kromer

Dataminds - ML in Production

Dataminds - ML in Production

Dataminds - ML in ProductionNathan Bijnens

Earth on AWS - Next-Generation Open Data Platforms

Earth on AWS - Next-Generation Open Data Platforms

Earth on AWS - Next-Generation Open Data PlatformsAmazon Web Services

New York Elastic{ON} Tour Opening Keynote

New York Elastic{ON} Tour Opening Keynote

New York Elastic{ON} Tour Opening KeynoteElasticsearch

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesElasticsearch

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

Activeeon use cases for cloud, digital transformation, IoT and big data autom...Activeeon

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개r-kor

Azure Databricks & Spark @ Techorama 2018

Azure Databricks & Spark @ Techorama 2018

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

20181027 deep learningcommunity_aws

20181027 deep learningcommunity_aws

20181027 deep learningcommunity_awsHirokuni Uchida

Real-Time Health Score Application using Apache Spark on Kubernetes

Real-Time Health Score Application using Apache Spark on Kubernetes

Real-Time Health Score Application using Apache Spark on KubernetesDatabricks

Opening Keynote

Opening Keynote

Opening KeynoteElasticsearch

Discurso de apertura

Discurso de apertura

Discurso de aperturaElasticsearch

Introducing the Hub for Data Orchestration

Introducing the Hub for Data Orchestration

Introducing the Hub for Data OrchestrationAlluxio, Inc.

Earth Observation in the Cloud using ENVI

Earth Observation in the Cloud using ENVI

Earth Observation in the Cloud using ENVIAmazon Web Services

Resistance is futile, resilience is crucial

Resistance is futile, resilience is crucial

Resistance is futile, resilience is crucialHristo Iliev

AWS Summit 2011: Big Data Analytics in the AWS cloud

AWS Summit 2011: Big Data Analytics in the AWS cloud

AWS Summit 2011: Big Data Analytics in the AWS cloudAmazon Web Services

Collecting metrics with Graphite and StatsD

Collecting metrics with Graphite and StatsD

Collecting metrics with Graphite and StatsDitnig

Enancing Threat Detection with Big Data and AI

Enancing Threat Detection with Big Data and AI

Enancing Threat Detection with Big Data and AIDatabricks

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks

Data relay introduction to big data clusters

Data relay introduction to big data clusters

Data relay introduction to big data clustersChris Adkin

Azure Data Factory v2

Azure Data Factory v2

Azure Data Factory v2inovex GmbH

Mais conteúdo relacionado

Mais procurados

Data cleansing and data prep with synapse data flows

Data cleansing and data prep with synapse data flows

Data cleansing and data prep with synapse data flowsMark Kromer

Dataminds - ML in Production

Dataminds - ML in Production

Dataminds - ML in ProductionNathan Bijnens

Earth on AWS - Next-Generation Open Data Platforms

Earth on AWS - Next-Generation Open Data Platforms

Earth on AWS - Next-Generation Open Data PlatformsAmazon Web Services

New York Elastic{ON} Tour Opening Keynote

New York Elastic{ON} Tour Opening Keynote

New York Elastic{ON} Tour Opening KeynoteElasticsearch

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesElasticsearch

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

Activeeon use cases for cloud, digital transformation, IoT and big data autom...Activeeon

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개r-kor

Azure Databricks & Spark @ Techorama 2018

Azure Databricks & Spark @ Techorama 2018

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

20181027 deep learningcommunity_aws

20181027 deep learningcommunity_aws

20181027 deep learningcommunity_awsHirokuni Uchida

Real-Time Health Score Application using Apache Spark on Kubernetes

Real-Time Health Score Application using Apache Spark on Kubernetes

Real-Time Health Score Application using Apache Spark on KubernetesDatabricks

Opening Keynote

Opening Keynote

Opening KeynoteElasticsearch

Discurso de apertura

Discurso de apertura

Discurso de aperturaElasticsearch

Introducing the Hub for Data Orchestration

Introducing the Hub for Data Orchestration

Introducing the Hub for Data OrchestrationAlluxio, Inc.

Earth Observation in the Cloud using ENVI

Earth Observation in the Cloud using ENVI

Earth Observation in the Cloud using ENVIAmazon Web Services

Resistance is futile, resilience is crucial

Resistance is futile, resilience is crucial

Resistance is futile, resilience is crucialHristo Iliev

AWS Summit 2011: Big Data Analytics in the AWS cloud

AWS Summit 2011: Big Data Analytics in the AWS cloud

AWS Summit 2011: Big Data Analytics in the AWS cloudAmazon Web Services

Collecting metrics with Graphite and StatsD

Collecting metrics with Graphite and StatsD

Collecting metrics with Graphite and StatsDitnig

Enancing Threat Detection with Big Data and AI

Enancing Threat Detection with Big Data and AI

Enancing Threat Detection with Big Data and AIDatabricks

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks

Mais procurados (20)

Data cleansing and data prep with synapse data flows

Data cleansing and data prep with synapse data flows

Data cleansing and data prep with synapse data flows

Dataminds - ML in Production

Dataminds - ML in Production

Dataminds - ML in Production

Earth on AWS - Next-Generation Open Data Platforms

Earth on AWS - Next-Generation Open Data Platforms

Earth on AWS - Next-Generation Open Data Platforms

New York Elastic{ON} Tour Opening Keynote

New York Elastic{ON} Tour Opening Keynote

New York Elastic{ON} Tour Opening Keynote

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Logging, indicateurs et APM : le trio gagnant pour des opérations réussies

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

Activeeon use cases for cloud, digital transformation, IoT and big data autom...

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개

Azure Databricks & Spark @ Techorama 2018

Azure Databricks & Spark @ Techorama 2018

Azure Databricks & Spark @ Techorama 2018

20181027 deep learningcommunity_aws

20181027 deep learningcommunity_aws

20181027 deep learningcommunity_aws

Real-Time Health Score Application using Apache Spark on Kubernetes

Real-Time Health Score Application using Apache Spark on Kubernetes

Real-Time Health Score Application using Apache Spark on Kubernetes

Opening Keynote

Opening Keynote

Opening Keynote

Discurso de apertura

Discurso de apertura

Discurso de apertura

Introducing the Hub for Data Orchestration

Introducing the Hub for Data Orchestration

Introducing the Hub for Data Orchestration

Earth Observation in the Cloud using ENVI

Earth Observation in the Cloud using ENVI

Earth Observation in the Cloud using ENVI

Resistance is futile, resilience is crucial

Resistance is futile, resilience is crucial

Resistance is futile, resilience is crucial

AWS Summit 2011: Big Data Analytics in the AWS cloud

AWS Summit 2011: Big Data Analytics in the AWS cloud

AWS Summit 2011: Big Data Analytics in the AWS cloud

Collecting metrics with Graphite and StatsD

Collecting metrics with Graphite and StatsD

Collecting metrics with Graphite and StatsD

Enancing Threat Detection with Big Data and AI

Enancing Threat Detection with Big Data and AI

Enancing Threat Detection with Big Data and AI

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Semelhante a Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen

Data relay introduction to big data clusters

Data relay introduction to big data clusters

Data relay introduction to big data clustersChris Adkin

Azure Data Factory v2

Azure Data Factory v2

Azure Data Factory v2inovex GmbH

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea

Big Data Analytics from Azure Cloud to Power BI Mobile

Big Data Analytics from Azure Cloud to Power BI Mobile

Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim

Cloud-based Data Lake for Analytics and AI

Cloud-based Data Lake for Analytics and AI

Cloud-based Data Lake for Analytics and AITorsten Steinbach

The Scout24 Data Platform (A Technical Deep Dive)

The Scout24 Data Platform (A Technical Deep Dive)

The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski

V like Velocity, Predicting in Real-Time with Azure ML

V like Velocity, Predicting in Real-Time with Azure ML

V like Velocity, Predicting in Real-Time with Azure MLBarbara Fusinska

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...HostedbyConfluent

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanVMware Tanzu

Heterogeneous Data Mining with Spark

Heterogeneous Data Mining with Spark

Heterogeneous Data Mining with SparkKNIMESlides

Launching Your First Big Data Project on AWS

Launching Your First Big Data Project on AWS

Launching Your First Big Data Project on AWSAmazon Web Services

Azure AI Conference Report

Azure AI Conference Report

Azure AI Conference ReportOsamu Masutani

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet

What’s New in ScyllaDB Open Source 5.0

What’s New in ScyllaDB Open Source 5.0

What’s New in ScyllaDB Open Source 5.0ScyllaDB

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks

Modernizing upstream workflows with aws storage - john mallory

Modernizing upstream workflows with aws storage - john mallory

Modernizing upstream workflows with aws storage - john malloryAmazon Web Services

Semelhante a Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen (20)

Data relay introduction to big data clusters

Data relay introduction to big data clusters

Data relay introduction to big data clusters

Azure Data Factory v2

Azure Data Factory v2

Azure Data Factory v2

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호

AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호

Big Data Analytics from Azure Cloud to Power BI Mobile

Big Data Analytics from Azure Cloud to Power BI Mobile

Big Data Analytics from Azure Cloud to Power BI Mobile

Cloud-based Data Lake for Analytics and AI

Cloud-based Data Lake for Analytics and AI

Cloud-based Data Lake for Analytics and AI

The Scout24 Data Platform (A Technical Deep Dive)

The Scout24 Data Platform (A Technical Deep Dive)

The Scout24 Data Platform (A Technical Deep Dive)

V like Velocity, Predicting in Real-Time with Azure ML

V like Velocity, Predicting in Real-Time with Azure ML

V like Velocity, Predicting in Real-Time with Azure ML

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman

Heterogeneous Data Mining with Spark

Heterogeneous Data Mining with Spark

Heterogeneous Data Mining with Spark

Launching Your First Big Data Project on AWS

Launching Your First Big Data Project on AWS

Launching Your First Big Data Project on AWS

Azure AI Conference Report

Azure AI Conference Report

Azure AI Conference Report

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...

What’s New in ScyllaDB Open Source 5.0

What’s New in ScyllaDB Open Source 5.0

What’s New in ScyllaDB Open Source 5.0

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...

Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...

Modernizing upstream workflows with aws storage - john mallory

Modernizing upstream workflows with aws storage - john mallory

Modernizing upstream workflows with aws storage - john mallory

Mais de Databricks

DW Migration Webinar-March 2022.pptx

DW Migration Webinar-March 2022.pptx

DW Migration Webinar-March 2022.pptxDatabricks

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 1Databricks

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 1 | Part 2Databricks

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 2Databricks

Data Lakehouse Symposium | Day 4

Data Lakehouse Symposium | Day 4

Data Lakehouse Symposium | Day 4Databricks

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Democratizing Data Quality Through a Centralized Platform

Democratizing Data Quality Through a Centralized Platform

Democratizing Data Quality Through a Centralized PlatformDatabricks

Learn to Use Databricks for Data Science

Learn to Use Databricks for Data Science

Learn to Use Databricks for Data ScienceDatabricks

Why APM Is Not the Same As ML Monitoring

Why APM Is Not the Same As ML Monitoring

Why APM Is Not the Same As ML MonitoringDatabricks

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks

Stage Level Scheduling Improving Big Data and AI Integration

Stage Level Scheduling Improving Big Data and AI Integration

Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks

Sawtooth Windows for Feature Aggregations

Sawtooth Windows for Feature Aggregations

Sawtooth Windows for Feature AggregationsDatabricks

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks

Re-imagine Data Monitoring with whylogs and Spark

Re-imagine Data Monitoring with whylogs and Spark

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Raven: End-to-end Optimization of ML Prediction Queries

Raven: End-to-end Optimization of ML Prediction Queries

Raven: End-to-end Optimization of ML Prediction QueriesDatabricks

Processing Large Datasets for ADAS Applications using Apache Spark

Processing Large Datasets for ADAS Applications using Apache Spark

Processing Large Datasets for ADAS Applications using Apache SparkDatabricks

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta LakeDatabricks

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx

DW Migration Webinar-March 2022.pptx

DW Migration Webinar-March 2022.pptx

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 4

Data Lakehouse Symposium | Day 4

Data Lakehouse Symposium | Day 4

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Democratizing Data Quality Through a Centralized Platform

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Learn to Use Databricks for Data Science

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

Why APM Is Not the Same As ML Monitoring

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Stage Level Scheduling Improving Big Data and AI Integration

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Sawtooth Windows for Feature Aggregations

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Re-imagine Data Monitoring with whylogs and Spark

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Raven: End-to-end Optimization of ML Prediction Queries

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Processing Large Datasets for ADAS Applications using Apache Spark

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Último

FESE Capital Markets Fact Sheet 2024 Q1.pdf

FESE Capital Markets Fact Sheet 2024 Q1.pdf

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Smarteg dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

BabyOno dropshipping via API with DroFx.pptx

BabyOno dropshipping via API with DroFx.pptx

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Week-01-2.ppt BBB human Computer interaction

Week-01-2.ppt BBB human Computer interaction

Week-01-2.ppt BBB human Computer interactionfulawalesam

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptxfirstjob4

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Schema on read is obsolete. Welcome metaprogramming..pdf

Schema on read is obsolete. Welcome metaprogramming..pdf

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

April 2024 - Crypto Market Report's Analysis

April 2024 - Crypto Market Report's Analysis

April 2024 - Crypto Market Report's Analysismanisha194592

Carero dropshipping via API with DroFx.pptx

Carero dropshipping via API with DroFx.pptx

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Último (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf

FESE Capital Markets Fact Sheet 2024 Q1.pdf

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Smarteg dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptx

BigBuy dropshipping via API with DroFx.pptx

BabyOno dropshipping via API with DroFx.pptx

BabyOno dropshipping via API with DroFx.pptx

BabyOno dropshipping via API with DroFx.pptx

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Week-01-2.ppt BBB human Computer interaction

Week-01-2.ppt BBB human Computer interaction

Week-01-2.ppt BBB human Computer interaction

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Halmar dropshipping via API with DroFx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptx

Introduction-to-Machine-Learning (1).pptx

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

Ravak dropshipping via API with DroFx.pptx

Schema on read is obsolete. Welcome metaprogramming..pdf

Schema on read is obsolete. Welcome metaprogramming..pdf

Schema on read is obsolete. Welcome metaprogramming..pdf

April 2024 - Crypto Market Report's Analysis

April 2024 - Crypto Market Report's Analysis

April 2024 - Crypto Market Report's Analysis

Carero dropshipping via API with DroFx.pptx

Carero dropshipping via API with DroFx.pptx

Carero dropshipping via API with DroFx.pptx

Using Apache Spark Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machineswith Jan Phillipp Simen

1. Spark + AI Summit Europe, London, 2018-10-03 Jan-Philipp Simen Data Scientist Carl Zeiss AG Structured Streaming on Azure Databricks for Predictive Maintenance of Coordinate Measuring Machines #SAISEnt1

2. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 2 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary

3. ZEISS Camera Lenses Three Technical Oscars #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 3

4. ZEISS Research & Quality Technology More Than 20 Nobel Prizes Enabled by ZEISS Microscopes https://www.zeiss.com/microscopy/int/about-us/nobel-prize-winners.html #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 4

5. ZEISS Semiconductor Manufacturing Technology With Lithography at 13.5 Nanometers Wavelength, ZEISS Is Enabling the Digital World #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 5 More on Extreme Ultra Violet Lithography: https://www.youtube.com/watch?v=Hfsp2jIjDpI

6. ZEISS Industrial Metrology Precision up to 0.3 µm #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 6 Quality Assurance Metrology Coordinate Measuring Machines Usually comparing CAD model to produced workpiece tactile x-ray optical laser

7. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 7 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary

8. How Will Predictive Maintenance Benefit Our Customers? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 8 Main goals: 1. avoid downtime 2. ensure reliable measurements

9. #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 9 Delivering Customer Value from the Start Start small, learn from data and develop first value-adding services One-click support request2 Predictive maintenance notifications in app4 Predictive models for internal use at ZEISS3 Condition monitoring app1

10. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 10 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary

11. Streaming Architecture on Azure #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 11 Spark Structured Streaming Azure IoT Hub … Azure Event HubsAzure Event Hubs Connector for Apache Spark Azure CosmosDB Azure Cosmos DB Connector for Apache Spark Data Lake Parquet Files {JSON}

12. In a Streaming Pipeline the Drawbacks of Untyped SQL Operations Are Even More Apparent #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 12 def readSensor9Untyped(streamingDF: DataFrame): DataFrame = streamingDF.select( $"machineID", $"dateTime", udfParseDouble(”sensor9")($"jsonString")) def udfParseDouble(key: String) = ??? What is in there, again? Did I spell this right? Don’t forget to document the schema! Our solution: Spark Datasets and Scala Case Classes

13. Spark Datasets + Scala Case Classes = Typesafe Data Processing def jsonToRecord(sensorID: Int)(event: Event): Record[Option[Double]] = Record[Option[Double]](event.machineID, event.dateTime, parseSensorValue(sensorID, event.jsonString)) def parseSensorValue = ??? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 13 case class Record[T](machineID: String, timestamp: Timestamp, value: T) case class Event(machineID: String, dateTime: Timestamp jsonString: String) def readSensor9(streamingDS: Dataset[Event]): Dataset[Record[Option[Double]]] = streamingDS.map(jsonToRecord(9)) Spark features wish list • Typesafe joins for Datasets • Import / export case classes from / to JSON Schema would allow centralized schema repository Leveraging the full potential of a good IDE: • Auto completion • Compile checks • Quick fixes • Refactoring

14. Implementing Alert Logic with Stateful Streaming /** Checks a stream of JSON messages for threshold violations. * * @param jsonStream Dataset containing JSON messages as received from IoT Hub * @param cmms Dataset with thresholds and other specifications for each machine * @return Dataset with alert events as expected by the Eventhub */ def createAlerts(jsonStream: Dataset[JsonMessage], cmms: Dataset[CmmSpecs]): Dataset[AlertEvent] = { val joined: Dataset[JsonPlusCmmSpecs] = joinStaticOnStream(jsonStream, cmms) // Group by serial number val grouped = joined.groupByKey(_.machineID) // Evaluate incoming messages in each group (that is for each individual machine) grouped.flatMapGroupsWithState[AlertState, AlertEvent](...)(evaluateTemperatureState) } def evaluateTemperatureState = ??? #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 14

15. Automated Build and Deployment #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 15 Code Repository 1 Build Agent 2 Azure Databricks Platform 3 Scala SBT project shared code base Scala notebooks job specific logic Shell scripts job configurations Build jar artifact using SBT assembly DBFSUpload Work- space Upload DatabricksRESTAPI Execute “create job” requests Job 1 Job 2 Job 3

16. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 16 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary

17. Batch ETL Architecture #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 17 Azure BlobManual upload by service technicians 50’000 zip files 30 to 200 MB per file Spark RDDs and Datasets Data Lake Parquet files Winzip files

18. Unzipping Lessons Learned In Theory val allFilesOnBlob: RDD[(String, PortableDataStream)] = sc.binaryFiles(path) def unzip(zipFile: PortableDataStream): Map[String, String] = easyUnzipFunction(zipFile) val unzippedFiles: RDD[(String, Map[String, String])] = allFilesOnBlob.mapValues(unzip) In Practice • 250 lines of code only for the unzipping (Winzip format)! • A lot of exception handling during unzipping. • Many debugging iterations. • Size of zip files ranging from 30 MB to 200 MB (compressed) ➡ Had to reserve a lot of memory overhead for big zip files ➡ Split RDD into three buckets, depending on file size Next time: Use gzip compression on file level instead of putting varying amounts of files into a Winzip archive. #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 18

19. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 19 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary

20. 20 Historic log files Azure Blob Data Lake RDDs and Datasets Combining Batch and Streaming Data Sources and Adding Scheduled Machine Learning Notebooks #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG MLlib Training historic logs, streaming events and service data trained models MLlib Predictiontrained models + recent streaming events Prediction Dashboard Structured Streaming Azure IoT Hub … Azure Event Hubs Azure CosmosDB {JSON} ∞ ESB CRM and service data

21. Agenda #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 21 1 About ZEISS 2 Motivation for Predictive Maintenance 3 Processing Live Data Streams 4 Processing Historical Data Batches 5 Bringing It Together 6 Summary

22. Things We Like #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 22 Things we like about Spark q Datasets: Typesafe data processing q Stateful Streaming: Powerful transformations of streaming datasets q MLlib: State-of-the-art distributed algorithms q All the other great things: Scalability, performance, … Things we like about Azure Databricks q Cluster management: Convenient setup, auto-scaling, auto-shutdown q Job management: Convenient creation and scheduling of Spark jobs q REST API: Enables automated deployment

23. Wish List #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 23 Features we would love to see in the future q Datasets: Typesafe joins q Central schema repository with export to different formats (Scala case class, JSON schema) q Better monitoring for multiple, long-running jobs: Integration with external log aggregators or customizable log metrics in Databricks

24. Contact #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 24 Dr. Jan-Philipp Simen Data Scientist Digital Innovation Partners Carl Zeiss AG Kistlerhofstraße 70 81379 Munich jan-philipp.simen@zeiss.com https://www.linkedin.com/in/jpsimen

25. #SAISEnt1 Jan-Philipp Simen, Carl Zeiss AG 25