Continuously improving factory operations is of critical importance to manufacturers. Consider the facts: the total cost of poor quality amounts to a staggering 20% of sales (American Society of Quality) and unplanned downtime costs plants approximately $50 billion per year (Deloitte).
The most pressing questions are: which process variables effect quality and yield and which process variables predict equipment failure? Getting to those answers is providing forward thinking manufacturers a leg up over competitors.
The speakers address the data management challenges facing today's manufacturers, including proprietary systems and silo'ed data sources, as well as an inability to make sensor-based data usable.
Integrating enterprise data from ERP, MES, maintenance systems and other sources with real time operations data from sensors, PLCs, SCADA systems and historians represents a major first step. But how to get started? What is the value of a data lake? How are AI/ML being applied to enable real time action?
Join us for this educational session, which includes a rare view from one of our SWAT team experts into our roadmap for an open source industrial IoT data management platform.
Key Takeaways:
• How to choose an initial project from which to quickly demonstrate high value returns
• Understand the value of multivariate data sources, as opposed to a single sensor on a piece of equipment
• Understand advances in big data management and streaming analytics that are paving the way to next-generation factory performance
MICHAEL GER, General Manager, Manufacturing and Automotive, Hortonworks and RYAN TEMPLETON, Senior Solutions Engineer, Hortonworks
The American Society of Quality (http://ASQ.org) estimates that the total cost of poor quality (COPQ) amounts to a staggering 20 percent of sales
According to Deloitte, poor maintenance strategies can reduce a plant’s overall productive capacity by 5 to 20 percent and unplanned downtime is costing industrial manufacturers an estimated $50 billion each year.
On average, companies spend 2-3% of their annual revenue on warranty costs. And perhaps one of the biggest issues: consumers are informed and have choices: – 91% of unhappy customers will n ever purchase from that company again.
Or lower this down….
Quote manufacture size limits
Market Intelligence indicates that Time series is a rapidly growing area of interest for IT departments as show by this chart from DB-Engines.com
This is confirmed by our own experience in the field working with customers
Capturing and managing time series data sets is always in the top few uses for our customers in many market segments
From a market perspective, it’s important to understand and appreciate the intersection of the big data & analytics market and the Internet of Things market. Modern customer-centric data applications are fueled by both data-in-motion and data-at-rest.
The result is actionable intelligence derived from ALL available data that aligns the business with its customers and drives next generation business models.
[NEXT]
Broadly our role at Hortonworks putting good data and high quality tools into the hands of the right people
Like so many things this too is easier said than done and many organizations have navigated the “organizational tangle” with point to point connection of data sources to specific data consumers
Every new consumer or data is a new end to end project
Unfortunately this is very resource intensive to maintain and doesn’t support “self service” “or one stop shopping” information delivery models
Step 1 for many organizations was to begin building central repositories or “Data Lakes”
This approach simplified architectures from the unmanageable many to many to a many to one scenario
Data democratization occurred because it became simple manage different consumption preferences from a single place
However, data lakes did not make “getting the data” any easier for the ICS data sources
Because:
Large variety of data sources
Simple Embedded systems such as instrumentation
Software installed on commodity hardware
Integrated hardware software suites
Further there are cn wide spread standards in
Communications media or protocols
Data types
Information models
The data sources live at the tattered and ragged edge of most organizations network
It is common to find devices that are remote, old, have limited power, and even more limited limited bandwidth
Finally due to the potential for malicous mis use of ICS devices they must operate on secured networks to ensure
Safety of the people and property
Security of sensitive information ( process knowledge )
Enter HDF and its robust data collection capabilities to aid our customers in acquisition of data from ICS devices and delivery of the acquired data to the access engines best suited for associated data consumer
Nifi provides
Access to a large variety of sources and sinks through its
Build in Large library of processors
Easy assembly of customer processors when customer protocols are encounters
When devices are near that tattered and ragged edge of a network
Nifi components such as Minify can enable Store and forward, compression, batch delivery capabilities to help overcome limitations imposed by the devices and networks
A comprehensive suite of security tools are available to help meet the demands of ICS Security Requirements we data must be extracted from secured networks
such as Kerberos, SSL, 20+ encryption cyphers for data, and standardized protocols amount the HDF components
Challenges and Opportunities
Experienced higher-than-usual discard rates on certain vaccines
Investigation of causes hampered by huge data volumes and spreadsheet based analytics
Data sources include process-historian systems on the shop floor that tag and track each batch. Maintenance systems detail plant equipment service dates and calibration settings. Building-management systems capture air pressure, temperature, and other readings in multiple locations at each plant, sampling by the minute.
Aligning all this data from disparate systems and spotting abnormalities took months using the spreadsheet-based approach, and storage and memory limits meant researchers could only look at a batch or two at a time.
Process
In the first month, the team loaded the data onto a partition of the cloud-based platform, and used MapReduce, Hive, and advanced dynamic time-warping techniques to aggregate and align the data sets around common metadata dimensions such as batch IDs, plant equipment IDs, and time stamps.
In the second month, analysts used R-based analytics to chart and cluster every batch of the vaccine ever made on a heat map. Spotting notable patterns, the team then used R to produce investigative histograms and scatter plots, and it drilled down with Hive to explore hypotheses about the factors tied to low-yield production runs. Using an Agile development approach, the team set up daily data-exploration goals, but it could change course by that afternoon if it failed to find solid data backing up a particular hypothesis.
In the third month, the team developed models, testing against the trove of historical data to prove and disprove leading theories about yield factors.
Benefits/Results
Hadoop enables the pharmaceutical company to crunch huge amounts of data resulting in the ability to develop and bring vaccines to market faster and at lower cost.
The team was able to come up with conclusive answers about production yield variance within just three months
Through 15 billion calculations and more than 5.5 million batch-to-batch comparisons, Merck discovered that certain characteristics in the fermentation phase of vaccine production were closely tied to yield in a final purification step.
Merck intends to optimize the production of other vaccines now in development. They're all potentially lifesaving products, according to Merck, and it's clear that the new data analysis approach marks a huge advance in ensuring efficient manufacturing and a more plentiful supply.
NOTES
Yield optimization is a high value Big Data use case relevant to all forms of complex process manufacturing, from Semiconductor to Storage, Biotech, and Merck is a great example. It is common for Biotech manufacturers like Merck to monitor more than 200 variables in the complex fermentation process to produce vaccines, and Merck was experiencing far higher discard rates on some of their vaccines and they needed to determine the root cause of their costly yield variances.
Solution: Manufacturing Data Lake and Yield Optimization Analytics at Merck
Month 1: data loaded into Hadoop and aggregated and aligned the data sets around common metadata dimensions
Month 2: analytics to chart and cluster every batch of the vaccine ever made on a heat map, spotting notable patterns and investigating further
Month 3: team developed models, testing against trove of historical data to prove/disprove theories regarding yield factors
After 15 billion calculations and 5.5 million batch-to-batch comparisons, discovered characteristics closely correlated to yield