SlideShare a Scribd company logo
1 of 22
1 © Hortonworks Inc. 2011–2018. All rights reserved
Achieving a 360 Degree View of
Manufacturing via Open Source
Data Management
Michael Ger
General Manager, Manufacturing & Automotive, Hortonworks
Ryan Templeton
Solutions Engineer, Hortonworks
June 20, 2018
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereThe Industry 4.0 Imperative
Total cost of poor quality amounts to a staggering 20% of sales
American Society of Quality
Warranty costs to companies amount to approximately 2-3% of revenues
Warranty Week
Deloitte
Unplanned downtime costs manufacturers approximately $50 billion per year
Poor maintenance strategies can reduce plant capacity by 5 to 20%
Deloitte
McKinsey
Through 2016, just 16% of manufacturers have adopted Industrial IOT
strategies
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereBig Data For Manufacturing – So What’s New?
Data Variety
OT/IT convergence (Sensors, PLCs, Historians,
MES, ERP, Quality) and New Data Sources
(Image, Video, Audio, Social, RFID, Logs, etc.)
Expanded Analytics
Scope
From individual process focus to end-to-
end process optimization
Data Volume
Industrial Internet data to grow 2X faster
than any other data source
Computational
Power
New massively parallelized computing support ML
and AI, critical to next generation manufacturing
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes Here
Highest value data
Always on, always connected devices generate a
constant stream of data related to the operations of
industrial businesses
These datasets contain:
• What events occurred
• Why and event occurred, or not
• Quantification of an event’s impact
These datasets go by many names:
• “SCADA Data”
• “Control System Data”
• “Historian Data”
• “Machine Data”
• “Measurement Logs”
• “Telemetry”
How are my …
People?
Processes?
Equipment?
Lots of misnomers
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereChallenges in accessing machine level data
Control Systems
 Data is transmitted via
proprietary vendor
specific protocols
 Direct Integration with
control systems requires
protocol
translation/parsing for
each platform family
Nifi’s is a toolbox of connectors
 Ingest text files and interrogate REST
APIs using built in connectors
 Connect to industry standard protocols
like OPC UA with custom processors
 Build your own
Existing ICS Components
PLC, RTU & DCS
Open Source Tools
Process Historians &
OPC Servers
 Data is typically available
via programmatic access
such as OPC, API or SQL
 There is almost always an
option to create text files
Instrumentation
 Commonly only output
is electrical signals
 Integration with
sensors requires
specialized hardware
 serial bus, or wireless
are increasingly
available
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereThe Data Management Challenge for Manufacturers
Data is Siloed
• Datasets are dispersed
and difficult to access
• Insights are based
sourced from the silos
are narrow & incomplete
• Historical data (years) is not
easily accessible and often
takes longer than expected to
extract
• Data required to solve real
problems comes from several
different sources (lab systems,
product scheduling) and
requires significant manual
effort to pull the data together
Considerable effort
to leverage
• Lack of connectivity to
proper data analytics tools
• Python
• MATLAB
• SAS
• R
• Inability to find data and a
reliance on “tribal
knowledge” and previous
engineer’s spreadsheets to
find tags and queries
Data Analytics
Missing
Lack of Resources
Inability to Leverage BOTH OT and IT Data Sources
• Time dependence limits
how data sets can be
processed
• Point solutions are
almost always closed
source, locking users
into closed ecosystems
• Do not scale with
exponential data growth
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereTime Series data is emerging as the fastest growing type of data
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereThe Opportunity for a Connected Data Platform
 Real-Time Data from Connected
Manufacturing and edge analytics drive
the need to manage data in motion
– Hortonworks Data Flow (HDF) is a
comprehensive solution to manage data in
motion
 Big Data Analytics require the ability to
store and process vast amounts of big
data at rest
– Hortonworks Data Platform (HDP) is a
comprehensive solution to manage data at rest
 Modern Connected Manufacturing
applications require a Connected Data
Platform - managing both data-in-motion
and data-at-rest
Real-Time
Data
Big Data &
Analytics
MODERN DATA APPS
DATA
IN MOTION
DATA
AT REST
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereStepwise approach to these challenge
Remote Field or Manufacturing Site
RDBMS & EDW
Files / Other Unstructured Data
Video
IoT Gateways
WITSML
SCADA, DCS, PLC, RTU, Historians
Location 1
Data Consumers
Data
Marts
Analytics,
Statics &
Science
Visualization
& Dashboards
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereData lakes only address data access
Field or Manufacturing Site
RDBMS & EDW
Files / Other Unstructured Data
Video
IoT Gateways
WITSML
SCADA, DCS, PLC, RTU, Historians
Location 1
Data Consumers
Data
Marts
Analytics,
Statics &
Science
Visualization
& Dashboards
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereScalable data collection is required
Field or Manufacturing Site
RDBMS & EDW
Files / Other Unstructured Data
Video
IoT Gateways
WITSML
SCADA, DCS, PLC, RTU, Historians
Location 1
Data Consumers
Data
Marts
Analytics,
Statics &
Science
Visualization
& Dashboards
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereWinners Will Embrace the Connected Manufacturing Data Lifecycle
Example: Equipment Predictive Maintenance (Plant Uptime)
REAL-TIME DATA SOURCES
ENTERPRISE
TRANSACTION
SOURCES
ERP, MES, QMS,
Maintenance, ETC. MANUFACTURING DATA LAKE
6
REAL-TIME
MAINTENANCE ACTION
ACT
1
I
N
G
E
S
T
1 INGEST
2 STORE
3 PROCESS
4 •Data Discovery
•Business Intelligence
ANALYZE
MONITOR 5
Deploy
Connected Manufacturing
Sensors
PLCs
Historians
SCADAs
4
LEARN
•Develop Models
•Machine Learning
Model Inputs
• Historic sensor values
• Historic production
• Historic defects & repair orders
• Historic maintenance records
• Design parameters, etc.
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereSolution Component Detail
ENTERPRISE
TRANSACTION
SOURCES
MANUFACTURING DATA LAKE
6
1
I
N
G
E
S
T
4
4
Deploy
INGEST MONITOR
STORE AND PROCESS
ANALYZE
LEARN
1 5
REAL-TIME DATA SOURCES
Connected Manufacturing
Sensors
PLCs
Historians
SCADAs
ERP, MES, QMS,
Maintenance, ETC.
REAL-TIME
MAINTENANCE ACTION
ACT
Model Inputs
• Historic sensor values
• Historic production
• Historic defects & repair orders
• Historic maintenance records
• Design parameters, etc.
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereThe Road to Big Data Transformation – Maturity Phases
Data Consolidation
into Data Lakes
Data for Analytics
Process 360
FOCUS
BENEFIT
EXAMPLE
EXPLORATION
Machine
Learning
End-To-End
Optimization
Predictive Maintenance,
Yield Optimization
FOCUS
BENEFIT
EXAMPLE
OPTIMIZATION
Streaming
Analytics
Real-Time Actions
Real-Time
Maintenance Actions
FOCUS
BENEFIT
EXAMPLE
TRANSFORMATION
Individual Process,
Equipment, Location
Operational Dashboards
FOCUS
BENEFIT
EXAMPLE
AWARENESS
Point Optimization
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereManufacturing Operations – Sample Use Cases
Process 360
• Single-point access to critical
information for analytics
Process & Quality
Monitoring
• Reduce non-conformance, downtime,
scrap and late shipment costs
Equipment Predictive
Maintenance
• Reduce equipment downtime and
maintenance costs
Quality Event
Forensic Analysis
• Reduce scope of service campaigns
and recalls
Quality & Yield
Optimization
• Optimize process variables to improve
yields and quality
C
O
M
P
L
E
X
I
T
Y
Use Case Benefits
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereOptimizing Pharmaceutical Yields using Big Data Insights
Manufacturing
Major Biopharmaceuticals
Manufacturer
Core Use Cases
• Manufacturing Yield and
Quality Optimization
After 15 billion calculations and 5.5 million batch-to-batch
comparisons, discovered characteristics closely correlated to yield
Solution: Manufacturing Data Lake with Yield Optimization Analytics
 Month 1: Data loaded into Hadoop (sensors, historians, ERP)
 Month 2: Analytics to cluster every batch of vaccine ever made
 Month 3: Development of models, testing against all historical data
Problem: Higher than desired discard rates (lower yields) on certain
vaccines
 Root cause analysis hampered by huge data volumes and
spreadsheet based analytics
 Researchers could only look at a batch or two at a time
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereExample (Pharmaceutical Manufacturer)
Machine
Learning
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereFrom Refinery to Enterprise Level Analytics
Problem: Refinery-level analytics sub-optimizes performance
 Analytics performed at each refinery in Excel spreadsheets
 Missed opportunities for optimization based on larger data sets
Solution: Centralize data in Manufacturing Data Lake for analytics
 Ingest data from each refinery using with HDF into centralized Data Lake
 Initial data set was over 1 million data tags, grew to 6 million
 Data Types: Time series, raw materials, quality results, SAP work order
data, etc.
Benefits: Enterprise level-analytics to optimize performance
 ROI Analysis
 $106 million in cost savings per year
 20X ROI annually
Oil Refining
Multinational Oil & Gas
Company
Core Use Cases
• Blend Monitoring
• Corrosion Prediction
• Analyzer Reliability Analytics
• Heat Exchanger
Performance Analytics
• Inferential Models Analytics
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereData Lake for Connected Vehicle and Quality Analytics
Problem: Data silos impeded Connected Vehicle, quality management and
other cross-functional strategic initiatives
 Lacked infrastructure for storing long-term vehicle connected vehicle driving
data
 Data silos impeded ability to achieve a 360 degree view of vehicle quality data
for analytics
Solution: Implemented Enterprise Data Lake for cross-functional analytics
 Creating an Enterprise Data Lake for a wide range of data including dealership
repair orders, warranty claims, manufacturing shop floor and inspection data,
electric vehicle usage and performance, vehicle status and usage patterns
 Creating a unified Quality Data Lake including design quality, production quality
and final assembly quality data
Benefits: Cross-functional analytics leverages All enterprise data
 Data lake supports cross-functional data analytics for all types of data
 Platform planned to store 500TB of data in 2017
 Wide-ranging data analytics use cases planned for future
Automotive
Major Automotive OEM
Core Use Cases
• Connected Vehicle &
Value-Added Services
• Quality Event Detection
& Traceability
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereVehicle Lifecycle Quality Analytics
Enterprise Quality Data Lake Provides Foundation for Rapid Time-to-Resolution
Voice of
Vehicle
Telematics
Voice Of
Customer
• DTC’s
• Performance
Parameters
Social + Web
(Forums, Surveys, etc.)
Call Center
Service & Warranty
• Likes
• Dislikes
• Sentiment
• Complaints
• Claims
What
Happened?
1
Problem
Voice Of
Supplier
Production and
Operations
Voice Of
Factory
Production and
Operations
• Man
• Method
• Material
• Machine
• Man
• Method
• Material
• Machine
Why Did It
Happen?
2
Installed
Base
Voice Of
Field
• VINs
• Configurations
• Locations
Contain The
Problem
Voice Of
Design3
• Designs
• Parts/BOMs
• Features
• Specifications
Product
Development
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes HereWhy Open Source for Industry 4.0 Big Data Initiatives?
Data
Volume
Innovation
Lower
Risk
No Vendor
“Lock-In”
Open Source
Community
Economics
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot

Build your IoT project with Libelium devices: from sensors to the cloud (Libe...
Build your IoT project with Libelium devices: from sensors to the cloud (Libe...Build your IoT project with Libelium devices: from sensors to the cloud (Libe...
Build your IoT project with Libelium devices: from sensors to the cloud (Libe...
AGILE IoT
 

What's hot (20)

IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
 
Predictive Maintenance Systems, Technologies & Equipment Management Softwares...
Predictive Maintenance Systems, Technologies & Equipment Management Softwares...Predictive Maintenance Systems, Technologies & Equipment Management Softwares...
Predictive Maintenance Systems, Technologies & Equipment Management Softwares...
 
Digital transformation in Manufacturing
Digital transformation in ManufacturingDigital transformation in Manufacturing
Digital transformation in Manufacturing
 
Build your IoT project with Libelium devices: from sensors to the cloud (Libe...
Build your IoT project with Libelium devices: from sensors to the cloud (Libe...Build your IoT project with Libelium devices: from sensors to the cloud (Libe...
Build your IoT project with Libelium devices: from sensors to the cloud (Libe...
 
Digital Twin: Starting the journey
Digital Twin: Starting the journeyDigital Twin: Starting the journey
Digital Twin: Starting the journey
 
Smart Factory
Smart FactorySmart Factory
Smart Factory
 
Industry 4.0 and Internet of Things (IoT)- The Emerging Marketing Trends
Industry 4.0 and Internet of Things (IoT)- The Emerging Marketing TrendsIndustry 4.0 and Internet of Things (IoT)- The Emerging Marketing Trends
Industry 4.0 and Internet of Things (IoT)- The Emerging Marketing Trends
 
Iot applications in manufacturing
Iot applications in manufacturingIot applications in manufacturing
Iot applications in manufacturing
 
Applying Machine Learning Techniques to Revenue Management
Applying Machine Learning Techniques to Revenue ManagementApplying Machine Learning Techniques to Revenue Management
Applying Machine Learning Techniques to Revenue Management
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Energy IIoT - Industrial Internet of Things (IIoT) in Decentralized Digital O...
Energy IIoT - Industrial Internet of Things (IIoT) in Decentralized Digital O...Energy IIoT - Industrial Internet of Things (IIoT) in Decentralized Digital O...
Energy IIoT - Industrial Internet of Things (IIoT) in Decentralized Digital O...
 
Industry 4.0 Smart factory Application Story
Industry 4.0 Smart factory Application StoryIndustry 4.0 Smart factory Application Story
Industry 4.0 Smart factory Application Story
 
FactoryTalk® AssetCentre: Overview
FactoryTalk® AssetCentre: OverviewFactoryTalk® AssetCentre: Overview
FactoryTalk® AssetCentre: Overview
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
SAP Workloads on AWS
SAP Workloads on AWSSAP Workloads on AWS
SAP Workloads on AWS
 
Automate SAP S/4 HANA business processes across all user interfaces
Automate SAP S/4 HANA business processes across all user interfacesAutomate SAP S/4 HANA business processes across all user interfaces
Automate SAP S/4 HANA business processes across all user interfaces
 
BSM smart factory - Industry 4.0
BSM smart factory - Industry 4.0BSM smart factory - Industry 4.0
BSM smart factory - Industry 4.0
 
Nerospec IIoT Company Profile
Nerospec IIoT Company Profile Nerospec IIoT Company Profile
Nerospec IIoT Company Profile
 
IoT Landscape and its Key Trends in Deployment
IoT Landscape and its Key Trends in DeploymentIoT Landscape and its Key Trends in Deployment
IoT Landscape and its Key Trends in Deployment
 
From SCADA to IoT
From SCADA to IoTFrom SCADA to IoT
From SCADA to IoT
 

Similar to Achieving a 360 degree view of manufacturing

IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
DataWorks Summit
 

Similar to Achieving a 360 degree view of manufacturing (20)

IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
IIoT + Predictive Analytics: Solving for Disruption in Oil & Gas and Energy &...
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Powering the Future of Data  
Powering the Future of Data	   Powering the Future of Data	   
Powering the Future of Data  
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Achieving a 360 degree view of manufacturing

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Achieving a 360 Degree View of Manufacturing via Open Source Data Management Michael Ger General Manager, Manufacturing & Automotive, Hortonworks Ryan Templeton Solutions Engineer, Hortonworks June 20, 2018
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereThe Industry 4.0 Imperative Total cost of poor quality amounts to a staggering 20% of sales American Society of Quality Warranty costs to companies amount to approximately 2-3% of revenues Warranty Week Deloitte Unplanned downtime costs manufacturers approximately $50 billion per year Poor maintenance strategies can reduce plant capacity by 5 to 20% Deloitte McKinsey Through 2016, just 16% of manufacturers have adopted Industrial IOT strategies
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereBig Data For Manufacturing – So What’s New? Data Variety OT/IT convergence (Sensors, PLCs, Historians, MES, ERP, Quality) and New Data Sources (Image, Video, Audio, Social, RFID, Logs, etc.) Expanded Analytics Scope From individual process focus to end-to- end process optimization Data Volume Industrial Internet data to grow 2X faster than any other data source Computational Power New massively parallelized computing support ML and AI, critical to next generation manufacturing
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes Here Highest value data Always on, always connected devices generate a constant stream of data related to the operations of industrial businesses These datasets contain: • What events occurred • Why and event occurred, or not • Quantification of an event’s impact These datasets go by many names: • “SCADA Data” • “Control System Data” • “Historian Data” • “Machine Data” • “Measurement Logs” • “Telemetry” How are my … People? Processes? Equipment? Lots of misnomers
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereChallenges in accessing machine level data Control Systems  Data is transmitted via proprietary vendor specific protocols  Direct Integration with control systems requires protocol translation/parsing for each platform family Nifi’s is a toolbox of connectors  Ingest text files and interrogate REST APIs using built in connectors  Connect to industry standard protocols like OPC UA with custom processors  Build your own Existing ICS Components PLC, RTU & DCS Open Source Tools Process Historians & OPC Servers  Data is typically available via programmatic access such as OPC, API or SQL  There is almost always an option to create text files Instrumentation  Commonly only output is electrical signals  Integration with sensors requires specialized hardware  serial bus, or wireless are increasingly available
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereThe Data Management Challenge for Manufacturers Data is Siloed • Datasets are dispersed and difficult to access • Insights are based sourced from the silos are narrow & incomplete • Historical data (years) is not easily accessible and often takes longer than expected to extract • Data required to solve real problems comes from several different sources (lab systems, product scheduling) and requires significant manual effort to pull the data together Considerable effort to leverage • Lack of connectivity to proper data analytics tools • Python • MATLAB • SAS • R • Inability to find data and a reliance on “tribal knowledge” and previous engineer’s spreadsheets to find tags and queries Data Analytics Missing Lack of Resources Inability to Leverage BOTH OT and IT Data Sources • Time dependence limits how data sets can be processed • Point solutions are almost always closed source, locking users into closed ecosystems • Do not scale with exponential data growth
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereTime Series data is emerging as the fastest growing type of data
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereThe Opportunity for a Connected Data Platform  Real-Time Data from Connected Manufacturing and edge analytics drive the need to manage data in motion – Hortonworks Data Flow (HDF) is a comprehensive solution to manage data in motion  Big Data Analytics require the ability to store and process vast amounts of big data at rest – Hortonworks Data Platform (HDP) is a comprehensive solution to manage data at rest  Modern Connected Manufacturing applications require a Connected Data Platform - managing both data-in-motion and data-at-rest Real-Time Data Big Data & Analytics MODERN DATA APPS DATA IN MOTION DATA AT REST
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereStepwise approach to these challenge Remote Field or Manufacturing Site RDBMS & EDW Files / Other Unstructured Data Video IoT Gateways WITSML SCADA, DCS, PLC, RTU, Historians Location 1 Data Consumers Data Marts Analytics, Statics & Science Visualization & Dashboards
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereData lakes only address data access Field or Manufacturing Site RDBMS & EDW Files / Other Unstructured Data Video IoT Gateways WITSML SCADA, DCS, PLC, RTU, Historians Location 1 Data Consumers Data Marts Analytics, Statics & Science Visualization & Dashboards
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereScalable data collection is required Field or Manufacturing Site RDBMS & EDW Files / Other Unstructured Data Video IoT Gateways WITSML SCADA, DCS, PLC, RTU, Historians Location 1 Data Consumers Data Marts Analytics, Statics & Science Visualization & Dashboards
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereWinners Will Embrace the Connected Manufacturing Data Lifecycle Example: Equipment Predictive Maintenance (Plant Uptime) REAL-TIME DATA SOURCES ENTERPRISE TRANSACTION SOURCES ERP, MES, QMS, Maintenance, ETC. MANUFACTURING DATA LAKE 6 REAL-TIME MAINTENANCE ACTION ACT 1 I N G E S T 1 INGEST 2 STORE 3 PROCESS 4 •Data Discovery •Business Intelligence ANALYZE MONITOR 5 Deploy Connected Manufacturing Sensors PLCs Historians SCADAs 4 LEARN •Develop Models •Machine Learning Model Inputs • Historic sensor values • Historic production • Historic defects & repair orders • Historic maintenance records • Design parameters, etc.
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereSolution Component Detail ENTERPRISE TRANSACTION SOURCES MANUFACTURING DATA LAKE 6 1 I N G E S T 4 4 Deploy INGEST MONITOR STORE AND PROCESS ANALYZE LEARN 1 5 REAL-TIME DATA SOURCES Connected Manufacturing Sensors PLCs Historians SCADAs ERP, MES, QMS, Maintenance, ETC. REAL-TIME MAINTENANCE ACTION ACT Model Inputs • Historic sensor values • Historic production • Historic defects & repair orders • Historic maintenance records • Design parameters, etc.
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereThe Road to Big Data Transformation – Maturity Phases Data Consolidation into Data Lakes Data for Analytics Process 360 FOCUS BENEFIT EXAMPLE EXPLORATION Machine Learning End-To-End Optimization Predictive Maintenance, Yield Optimization FOCUS BENEFIT EXAMPLE OPTIMIZATION Streaming Analytics Real-Time Actions Real-Time Maintenance Actions FOCUS BENEFIT EXAMPLE TRANSFORMATION Individual Process, Equipment, Location Operational Dashboards FOCUS BENEFIT EXAMPLE AWARENESS Point Optimization
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereManufacturing Operations – Sample Use Cases Process 360 • Single-point access to critical information for analytics Process & Quality Monitoring • Reduce non-conformance, downtime, scrap and late shipment costs Equipment Predictive Maintenance • Reduce equipment downtime and maintenance costs Quality Event Forensic Analysis • Reduce scope of service campaigns and recalls Quality & Yield Optimization • Optimize process variables to improve yields and quality C O M P L E X I T Y Use Case Benefits
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereOptimizing Pharmaceutical Yields using Big Data Insights Manufacturing Major Biopharmaceuticals Manufacturer Core Use Cases • Manufacturing Yield and Quality Optimization After 15 billion calculations and 5.5 million batch-to-batch comparisons, discovered characteristics closely correlated to yield Solution: Manufacturing Data Lake with Yield Optimization Analytics  Month 1: Data loaded into Hadoop (sensors, historians, ERP)  Month 2: Analytics to cluster every batch of vaccine ever made  Month 3: Development of models, testing against all historical data Problem: Higher than desired discard rates (lower yields) on certain vaccines  Root cause analysis hampered by huge data volumes and spreadsheet based analytics  Researchers could only look at a batch or two at a time
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereExample (Pharmaceutical Manufacturer) Machine Learning
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereFrom Refinery to Enterprise Level Analytics Problem: Refinery-level analytics sub-optimizes performance  Analytics performed at each refinery in Excel spreadsheets  Missed opportunities for optimization based on larger data sets Solution: Centralize data in Manufacturing Data Lake for analytics  Ingest data from each refinery using with HDF into centralized Data Lake  Initial data set was over 1 million data tags, grew to 6 million  Data Types: Time series, raw materials, quality results, SAP work order data, etc. Benefits: Enterprise level-analytics to optimize performance  ROI Analysis  $106 million in cost savings per year  20X ROI annually Oil Refining Multinational Oil & Gas Company Core Use Cases • Blend Monitoring • Corrosion Prediction • Analyzer Reliability Analytics • Heat Exchanger Performance Analytics • Inferential Models Analytics
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereData Lake for Connected Vehicle and Quality Analytics Problem: Data silos impeded Connected Vehicle, quality management and other cross-functional strategic initiatives  Lacked infrastructure for storing long-term vehicle connected vehicle driving data  Data silos impeded ability to achieve a 360 degree view of vehicle quality data for analytics Solution: Implemented Enterprise Data Lake for cross-functional analytics  Creating an Enterprise Data Lake for a wide range of data including dealership repair orders, warranty claims, manufacturing shop floor and inspection data, electric vehicle usage and performance, vehicle status and usage patterns  Creating a unified Quality Data Lake including design quality, production quality and final assembly quality data Benefits: Cross-functional analytics leverages All enterprise data  Data lake supports cross-functional data analytics for all types of data  Platform planned to store 500TB of data in 2017  Wide-ranging data analytics use cases planned for future Automotive Major Automotive OEM Core Use Cases • Connected Vehicle & Value-Added Services • Quality Event Detection & Traceability
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereVehicle Lifecycle Quality Analytics Enterprise Quality Data Lake Provides Foundation for Rapid Time-to-Resolution Voice of Vehicle Telematics Voice Of Customer • DTC’s • Performance Parameters Social + Web (Forums, Surveys, etc.) Call Center Service & Warranty • Likes • Dislikes • Sentiment • Complaints • Claims What Happened? 1 Problem Voice Of Supplier Production and Operations Voice Of Factory Production and Operations • Man • Method • Material • Machine • Man • Method • Material • Machine Why Did It Happen? 2 Installed Base Voice Of Field • VINs • Configurations • Locations Contain The Problem Voice Of Design3 • Designs • Parts/BOMs • Features • Specifications Product Development
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Title Goes HereWhy Open Source for Industry 4.0 Big Data Initiatives? Data Volume Innovation Lower Risk No Vendor “Lock-In” Open Source Community Economics
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

Editor's Notes

  1. The American Society of Quality (http://ASQ.org) estimates that the total cost of poor quality (COPQ) amounts to a staggering 20 percent of sales According to Deloitte, poor maintenance strategies can reduce a plant’s overall productive capacity by 5 to 20 percent and unplanned downtime is costing industrial manufacturers an estimated $50 billion each year. On average, companies spend 2-3% of their annual revenue on warranty costs. And perhaps one of the biggest issues: consumers are informed and have choices: – 91% of unhappy customers will n ever purchase from that company again.
  2. Or lower this down….
  3. Quote manufacture size limits
  4. Market Intelligence indicates that Time series is a rapidly growing area of interest for IT departments as show by this chart from DB-Engines.com This is confirmed by our own experience in the field working with customers Capturing and managing time series data sets is always in the top few uses for our customers in many market segments
  5. From a market perspective, it’s important to understand and appreciate the intersection of the big data & analytics market and the Internet of Things market. Modern customer-centric data applications are fueled by both data-in-motion and data-at-rest. The result is actionable intelligence derived from ALL available data that aligns the business with its customers and drives next generation business models. [NEXT]
  6. Broadly our role at Hortonworks putting good data and high quality tools into the hands of the right people Like so many things this too is easier said than done and many organizations have navigated the “organizational tangle” with point to point connection of data sources to specific data consumers Every new consumer or data is a new end to end project Unfortunately this is very resource intensive to maintain and doesn’t support “self service” “or one stop shopping” information delivery models
  7. Step 1 for many organizations was to begin building central repositories or “Data Lakes” This approach simplified architectures from the unmanageable many to many to a many to one scenario Data democratization occurred because it became simple manage different consumption preferences from a single place However, data lakes did not make “getting the data” any easier for the ICS data sources Because: Large variety of data sources Simple Embedded systems such as instrumentation Software installed on commodity hardware Integrated hardware software suites Further there are cn wide spread standards in Communications media or protocols Data types Information models The data sources live at the tattered and ragged edge of most organizations network It is common to find devices that are remote, old, have limited power, and even more limited limited bandwidth Finally due to the potential for malicous mis use of ICS devices they must operate on secured networks to ensure Safety of the people and property Security of sensitive information ( process knowledge )
  8. Enter HDF and its robust data collection capabilities to aid our customers in acquisition of data from ICS devices and delivery of the acquired data to the access engines best suited for associated data consumer Nifi provides Access to a large variety of sources and sinks through its Build in Large library of processors Easy assembly of customer processors when customer protocols are encounters When devices are near that tattered and ragged edge of a network Nifi components such as Minify can enable Store and forward, compression, batch delivery capabilities to help overcome limitations imposed by the devices and networks A comprehensive suite of security tools are available to help meet the demands of ICS Security Requirements we data must be extracted from secured networks such as Kerberos, SSL, 20+ encryption cyphers for data, and standardized protocols amount the HDF components
  9. Challenges and Opportunities Experienced higher-than-usual discard rates on certain vaccines Investigation of causes hampered by huge data volumes and spreadsheet based analytics Data sources include process-historian systems on the shop floor that tag and track each batch. Maintenance systems detail plant equipment service dates and calibration settings. Building-management systems capture air pressure, temperature, and other readings in multiple locations at each plant, sampling by the minute. Aligning all this data from disparate systems and spotting abnormalities took months using the spreadsheet-based approach, and storage and memory limits meant researchers could only look at a batch or two at a time. Process In the first month, the team loaded the data onto a partition of the cloud-based platform, and used MapReduce, Hive, and advanced dynamic time-warping techniques to aggregate and align the data sets around common metadata dimensions such as batch IDs, plant equipment IDs, and time stamps. In the second month, analysts used R-based analytics to chart and cluster every batch of the vaccine ever made on a heat map. Spotting notable patterns, the team then used R to produce investigative histograms and scatter plots, and it drilled down with Hive to explore hypotheses about the factors tied to low-yield production runs. Using an Agile development approach, the team set up daily data-exploration goals, but it could change course by that afternoon if it failed to find solid data backing up a particular hypothesis. In the third month, the team developed models, testing against the trove of historical data to prove and disprove leading theories about yield factors. Benefits/Results Hadoop enables the pharmaceutical company to crunch huge amounts of data resulting in the ability to develop and bring vaccines to market faster and at lower cost. The team was able to come up with conclusive answers about production yield variance within just three months Through 15 billion calculations and more than 5.5 million batch-to-batch comparisons, Merck discovered that certain characteristics in the fermentation phase of vaccine production were closely tied to yield in a final purification step. Merck intends to optimize the production of other vaccines now in development. They're all potentially lifesaving products, according to Merck, and it's clear that the new data analysis approach marks a huge advance in ensuring efficient manufacturing and a more plentiful supply. NOTES Yield optimization is a high value Big Data use case relevant to all forms of complex process manufacturing, from Semiconductor to Storage, Biotech, and Merck is a great example. It is common for Biotech manufacturers like Merck to monitor more than 200 variables in the complex fermentation process to produce vaccines, and Merck was experiencing far higher discard rates on some of their vaccines and they needed to determine the root cause of their costly yield variances.
  10. Solution: Manufacturing Data Lake and Yield Optimization Analytics at Merck Month 1: data loaded into Hadoop and aggregated and aligned the data sets around common metadata dimensions Month 2: analytics to chart and cluster every batch of the vaccine ever made on a heat map, spotting notable patterns and investigating further Month 3: team developed models, testing against trove of historical data to prove/disprove theories regarding yield factors After 15 billion calculations and 5.5 million batch-to-batch comparisons, discovered characteristics closely correlated to yield
  11. W