This presentation introduces the concept of Machine Learning and then discusses how Machine Learning is being used in the Predictive Maintenance domain.
2. Table of Content
• Basics of Machine Learning
• Classical Programming vs Machine Learning
• Types of Machine Learning
• Types of Supervised Learning
• Application of ML in Predictive Maintenance (PdM)
• Types of Maintenance
• Goals & Use Cases for PdM
• Data Science For PdM
3. What is Machine Learning?
Task : Predict the price of an apartment in Bangalore
4. Classical Programming / Software 1.0
• Take help of a domain expert
• Survey existing apartments in Bangalore
• Identify factors contributing to the price of an apartment
• Area
• Size
• Number of Bedrooms, Bathrooms
• Name of the builder
• etc.
• Write a program which outputs the price based on the attributes
identified
Reference : https://medium.com/@karpathy/software-2-0-a64152b37c35
6. Machine Learning/Software 2.0
• First Step: Collect data (as much as possible)
Reference : https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data
11. ML Works Better When…
• Problems for which classical programming requires long list of rules
which is difficult to maintain. ML can simplify the code.
• ML “automatically” discovers change in data. Classical Programming
needs manual update in the rules.
• ML performs better for complex problems (Image, Text, Audio etc.)
• Humans can gain insights from ML models
12. Humans can gain insights from ML models
• Stages of Cancer
• Medical textbooks decides based on number of “yes” to the questions:
1. Has the cancer affected more than one lymph node?
2. Are the cancerous lymph nodes both above & below the bottom of the rib cage?
3. Is the cancer found in organs outside lymphatic system (in patient's bone marrow)?
• A 2018 Research paper (University of Modena & Reggio Emilia)
• Analyzed 15 variables, identifying 5 features
• Due to limited cognitive ability, humans need a handful of most
obvious signifiers/features
• ML/AI decides based on hundreds if not thousands distinct features
• May include traditional as well as less intuitive features
13. Machine Learning : Formal Definition
• A Machine is Learning when it improves at a task based on experience
at that task, but without explicit programming.
Reference : https://cloud.google.com/products/ai/ml-comic-1/
14. AI vs ML
• AI: Quest for developing non-biological
systems that exhibit human-like forms of
intelligence.
Reference: https://sebastianraschka.com/blog/2020/intro-to-dl-ch01.html
15. Examples of Machine Learning
• Recommending a video/song (Recommender System)
• Detecting cancer based on X-Ray Image (Computer Vision)
• Forecasting company’s revenue based on various factors (Time Series
Forecasting)
• Summarizing long document into smaller, meaningful text (Language
Processing)
• Writing HTML, SQL, Unix code based on human language (Language
Processing - GTP-3)
16. Types of ML Systems
• Whether or not trained with human supervision
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Whether learning is incremental
• Online Learning
• Batch Learning
• Instance based vs Model based learning
17. Supervised Learning
• User provides the algorithm with inputs (features) and desired
outputs (labels)
• The algorithm can create an output for an unseen input
• User (Teacher) is supervising the algorithm to learn
Input Output
18. Unsupervised Learning
• Only input data is known & passed to algorithm
• Output data is unknown
• Often used in understanding data better before solving a supervised
learning problem
• Usually harder to understand and evaluate
• Applications
• Segmenting readers based on their reading habits
• Identifying topics of news articles
• Anomaly Detection
• Dimensionality Reduction
• Clustering
Input
19. Unsupervised Learning : Clustering
• Each dot on plot represents a
research article on COVID
Reference: https://maksimekin.github.io/COVID19-Literature-Clustering/plots/t-sne_covid-19_interactive.html
20. Reinforcement Learning
• Steps
• Learning system (agent) observes an
environment
• Selects & performs actions
• Gets rewarded or punished for actions
• Learning system must learn by itself the best
strategy (policy) to win most reward over time.
• Examples
• Robotics
• AlphaGo Program
• Energy Efficiency
Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
21. Supervised Machine Learning
• Regression: Goal is to predict a continuous number
• Classification: Goal is to predict a class label
Label: Continuous Number
Label: Distinct Values
Reference: https://sebastianraschka.com/blog/2020/intro-to-dl-ch01.html
23. Types of Maintenance
• Reactive Maintenance
• Parts of an equipment are replaced only on failure
• Doesn’t waste part’s life, but results in downtime, unscheduled
maintenance
• Preventive Maintenance
• Replaces a part after pre-determined useful lifespan, before it
fails
• Avoids unscheduled maintenance
• Under utilization of parts
• Predictive Maintenance
• Replaces only the parts close to their failure (Just in time
replacement)
• Extends part’s lifespan
• Reduce unscheduled maintenance
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook https://arxiv.org/pdf/1912.07383.pdf
24. Predictive Maintenance (PdM) : Goals
• Predict if an equipment is going to fail in near future
• Predict days to failure
• Helps in scheduling a maintenance
• Predict most probable root cause of a failure
• Helps in identifying part(s) to repair/replace
25. Sample Use Cases
• Failure of engine parts in an aircraft
• HVAC equipment failure
• Elevators door failure
• Wind turbine failure
• Failure of wheels of train
26. Data Science For Predictive Maintenance
• Steps
• Convert Business Problem into Data Science problem
• Understand Data
• Prepare Data
• Building Model
• Evaluate Model
• Deploy Model
• Monitor/Maintain Model
Reference: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining
27. Business problem into Data Science problem
• Binary Classification
• Predict probability for an equipment to fail within a future time period
• Regression
• Predict amount of time that an equipment is operational before next failure
• Multi-class classification
• Predict probability for an equipment to fail within next ..3X, 2X, X unit of time
• Predict probability for an equipment to fail within a future time period for a particular
root cause
28. Binary Classification
• Goal: Predict probability of failure within next X unit of time
• Labels (Discrete Number)
• Failure within X time unit (1)
• Healthy (0)
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook
29. Regression
• Goal: Predict remaining useful life (RUL) of the equipment
• Label: Time for which an asset is operational before next failure (RUL)
• Continuous Number
• Disadvantage
• Equipment without any failures cannot be used for modeling
30. Multi-class Classification (1)
• Goal: Predict the probability of failure within next …, 3X, 2X, X units of
time
• Labels (Discrete Number)
• Healthy (0)
• Failure within 3X time unit (3Z)
• Failure within 2X time unit (2Z)
• Failure within X time unit (Z)
31. Multi-class Classification (2)
• Goal: Predict probability of failure next X units of time due to root
cause Pi?
• Labels
• Failure due to different root causes (P1, P2, P3, ..)
• Healthy (0)
33. Data Requirement
• Relevant Data
• Discuss with domain expert
• Sufficient Data
• Duration (Year, Month, Day..)
• Larger number of failures
• Different types of failures
• Quality of data
• Garbage In, Garbage Out
Reference: Google : Hidden Technical Debt in Machine Learning Systems
34. Data Collection
• Data Source
• Temporal Data
• Equipment’s Health
• Example: Vibration, Voltage, Temperature, Humidity, Pressure etc.
• Collected using IoT sensors
• Temporal features reflecting aging pattern & anomalies
• Represents normal & faulty behaviors over time
• Maintenance history
• Example: Dates of Repair activities, Components replaced etc.
• Captures degradation patterns
• Failure history
• Weather
• Usage (Load) of the equipment
• Static Data
• Equipment Metadata
• Manufacturer, Make, Model
• Manufacture Date, Installation Date, Age
• Geographical Location
35. Data Exploration & Validation
• Goal : Visualize & Validate
• Data is relevant
• Data includes expected patterns
• In case of no obvious patterns, add more features
Reference: https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii
36. Data Pre-Processing
• Structure data from various sources into tabular format
• Each row represents state of an equipment at any particular point of time
accompanied with a label
• Up-Sampling/Down-Sampling
• Data Collection frequency may not match with prediction frequency
• Data may be collected hourly, but, failure may be predicted at the day level
37. Data Pre-Processing
• Missing Value Handling
• Temporal Data (Examples)
• Forward Filling
• Interpolation
• Domain Specific
• Fill missing value of pressure of an equipment on 1 PM, Tuesday
• with last Tuesday 1 PM’s value
• with Tuesday 1 PM’s value averaged over last 1 month
• etc.
• Strategy should be validated using cross-validation
• Removal of duplicates
39. Feature Engineering (Temporal Data)
• Aggregation
• Data over individual time units (e.g. days) is noisy
• Needs to be smoothened by aggregating over time windows
• Examples
• Temperature: Fluctuating. Average value over day may rise with degradation
• Vibration: May increase drastically before failure. Max over day could be a
good feature
https://cloud.google.com/blog/products/data-analytics/a-process-for-implementing-industrial-predictive-maintenance-part-ii
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook
40. • “How far in future the model has to predict”
influences “how far in past the model has to
look back” to make predictions
• Lag Features
• “Looking back” period is called “Lag”
• Rolling Aggregate (Examples)
• Rolling Average of temperature over last 7, 15, 21
days
• Rolling Max of vibration over last 7, 15, 21 days
• Rolling count of alarms over last 1, 3, 5, 7 days
Feature Engineering (Temporal Data)
Rolling Aggregate
41. Feature Engineering (Temporal Data)
• Functions For Aggregation
• Count
• Average
• Maximum
• Minimum
• Median
• Standard Deviation
• Variance
• Count
• Sum
• Cumulative Sum
• Derivate
• 2nd Derivate
• Count of outliers
42. Feature Engineering
• Date
• Day
• Week
• Weekday/Weekend
• Month
• Quarter
• Year
• etc.
• Maintenance Data
• Days since last failure
• Days since last failure because of specific root cause
• Days since specific part replaced
• Days since last maintenance
• Static Data
• Age of the equipment
44. Cross Validation
• Goal
• Validates a model during & at the end of training
• Reduces Overfitting
• Generalizes well with unknown data
https://scikit-learn.org/stable/modules/cross_validation.html
45. Time Series Cross Validation
• In PdM, data is ordered following time
• Training, Validation, Test data must be split in Time dependent
manner.
• Validation data must be in future compared to training data
Reference: https://eng.uber.com/forecasting-introduction/
46. Split between Training & Test Data
• Split by Time
• Separate Train & Test data by the window size (“Look ahead time in future”)
• Split by Equipment
• Better performance with new equipment
47. Model Evaluation (Binary Classification)
• Goal: What metric to optimize for?
• Determining Factors
• Imbalanced Data
• High Cost of False Alarm
• Performance Metrics
• Accuracy: Not Suitable
• Precision: Lower value corresponds to higher rate of false alarms
• Recall: Higher value corresponds to successful identification of true failures.
• F1 Score: Harmonic average of precision and recall
• RoC (Receiver Operating Characteristics) Curve
48. Model Serving/Prediction
• Goal: Deploy the model in production, so that it starts making
prediction on new, unseen data
• Need
• Data must be pre-processed & engineered exactly the same way as the model
training
• Suggested Approach : Batch Scoring
• Model’s decision is not needed immediately
• Example : Once in a day predict equipment those are going to fail in next 7
days
49. Model Monitoring/Maintenance
• Evaluate model’s performance in
production
• Compare predictions vs ground truths
• Did the failures really happened as
predicted by model?
• Was the equipment healthy when
predicted?
• Degradation of model’s performance
may indicate need for retraining
Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html
50. References
• Machine Learning
• A visual introduction to machine learning
• Introduction to Machine Learning and Deep Learning by Sebastian Raschka
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
• Predictive Maintenance
• Azure AI guide for predictive maintenance solutions
• A process for implementing industrial predictive maintenance
• A Survey of Predictive Maintenance: Systems, Purposes and Approaches
RoC : A curve of true positive rate vs. false positive rate at different classification thresholds.
AuC : The Area Under the ROC curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.