SlideShare uma empresa Scribd logo
1 de 3
Baixar para ler offline
Perfect Data Mining & Predictive
Analytics Model Methodology
Sub-field of computer science develop from computational learning and pattern reorganization theory in artificial
intelligence, Machine learning is the method of making analytical models to automatically search previously
unknown patterns from data that point out associations, anomalies (outliers), sequences, classifications, and clusters
and segments. These patterns reveal hidden strategy as to why an event happened.
Businesses and organizations can take benefit of various types of uses for
machine learning:
• Segmentation, sets of clients who have same or similar purchase
patterns for objective marketing
• Classification based on a set of attributes to make a prediction
• Forecasts—When purchase projections based on time series
• Pattern detection that associates one product with other one to
reveal cross-sell sequences and opportunities.
• Anomaly detection— fraud detecting (for illustration)
Predictive analytics model methodology
The most widely used Cross Industry Standard Process for Data Mining methodology is used to develop predictive
analytical models. It includes 6 phases:
1. business understanding
2. data understanding
3. data preparation
4. model development using supervised
5. unsupervised learning
6. model evaluation and model deployment
Business understanding
The understanding of business phase involves understand and define the use case or business problem, the business
target and the business query that require to be answered. It also include defining success criteria. Then the criterion
project-related action require to be process. These tasks involve defining resource needs such as defining any
constraints, technology, people, money, creating a project plan, requirements, assessing risks and creating a
contingency plan.
Data understanding
The understanding of data phase includes data needs such as internal and external data sources, origin and data
characteristics (feature and quality) including 3Vs data volumes, variety, velocity, formats and so on, also whether the
data is in a relational database, flat files, a Hadoop Distributed File System (HDFS) or if it is live, streaming data. This
phase also includes data exploration and investigation using statistical analysis to look at hug data, In addition, a data
quality assessment includes understanding the degree to which data is missing, has errors, is duplicated, and is
inconsistent.
Data preparation
The objective of the data preparation phase is to produce a set of information that can be fed into machine-learning
algos. This process requires a number of tasks including filtering and cleaning; data conversion; data transformation;
data enrichment; and variable identification, which is also known as dimensionality reduction or feature selection.
Variable identification’s objective is to create a data set of the most relevant variables to be used as model input to
get optimal results. The intention is also to remove variables from a data set that are not useful as model input
without compromising the model’s accuracy—for illustration, the accuracy of the predictions it makes.
Model development
The model development phase is about the development of a machine-learning model. Models can be build up to
predict, forecast or analyze information to find patterns such as sets, groups and associations
Two types of machine learning can be used in model development:
1. supervised learning
2. unsupervised learning
Typically, predictive models are build up using supervised learning. For illustration, if we require to develop a model
for equipment failure prediction, we can use data that describes equipment that has actually failed. We can use that
data to train the new model to distinguish the profile of a piece of equipment that is colorable going to fail. To fulfill
this profile recognition, we divide the data segments which inclusive failed equipment data records into a test data
set and a training data set. Then we train the model by fill the training data set and segments into an algorithm,
various of which can be used for prediction. Then we test the model by test data set.
Unsupervised learning is a method of analyzing data to try and search masked patterns in the data that indicate
product association and groupings—for illustration, customer segmentation. Grouping is based on minimizing or
maximizing similarity. The K-indicates clustering algorithm is a most widely used algorithm for this approach.
Predictive and descriptive analytical models can be build up using advanced Developed data mining tools, analytics
clouds, data science interactive workbooks with procedural or declarative programming languages and automated
model development tools.
Model evaluation
Afterward Model developed, the next phase is to evaluate the accuracy and purity of predictions. For predictions,
this assessment means understanding how many predictions were correct and incorrect? Various process can
achieve this evaluation. Key measures in model evaluation are the number of true positives, true negatives, false
positives and false negatives. The surface line is that we need to make surely that the model is accurate; otherwise, it
could generate hug false positives that may result in incorrect actions and decisions.
Model deployment
Once we are happy with the model we’ve developed, the final phase involves deploying models to run in many
various environment. These environments include spreadsheets, analytics servers, database management systems
(DBMSs), applications, analytical relational database management systems, Apache Hadoop, Apache Spark and
streaming analytics platforms.

Mais conteúdo relacionado

Mais procurados

Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scopeTanmay Sethi
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Datawina wulansari
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesRajendran
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsGDi Techno Solutions
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : ConceptsPragya Pandey
 

Mais procurados (19)

Kdd process
Kdd processKdd process
Kdd process
 
Data mining
Data miningData mining
Data mining
 
Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scope
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Data
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining
Data miningData mining
Data mining
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 

Semelhante a Data Mining methodology

The Life Cycle Of Data Science PPT.pdf
The Life Cycle Of Data Science PPT.pdfThe Life Cycle Of Data Science PPT.pdf
The Life Cycle Of Data Science PPT.pdfPhurba Sherpa
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data ScienceJohn B. Rollins, Ph.D.
 
The 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationThe 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationRocketSource
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersSatyam Jaiswal
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentationNaveen Kumar
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedBhupesh Chaurasia
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docxJadhavArjun2
 
Chapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxssuser957b41
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfNeha Singh
 
Dwdm chapter 5 data mining a closer look
Dwdm chapter 5  data mining a closer lookDwdm chapter 5  data mining a closer look
Dwdm chapter 5 data mining a closer lookShengyou Lin
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxAnanthReddy38
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 

Semelhante a Data Mining methodology (20)

The Life Cycle Of Data Science PPT.pdf
The Life Cycle Of Data Science PPT.pdfThe Life Cycle Of Data Science PPT.pdf
The Life Cycle Of Data Science PPT.pdf
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
The 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationThe 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business Transformation
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & Answers
 
Data Mining
Data MiningData Mining
Data Mining
 
Analytics
AnalyticsAnalytics
Analytics
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdfMachine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting Started
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docx
 
Chapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptx
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
 
Dwdm chapter 5 data mining a closer look
Dwdm chapter 5  data mining a closer lookDwdm chapter 5  data mining a closer look
Dwdm chapter 5 data mining a closer look
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 

Último

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 

Último (20)

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 

Data Mining methodology

  • 1. Perfect Data Mining & Predictive Analytics Model Methodology Sub-field of computer science develop from computational learning and pattern reorganization theory in artificial intelligence, Machine learning is the method of making analytical models to automatically search previously unknown patterns from data that point out associations, anomalies (outliers), sequences, classifications, and clusters and segments. These patterns reveal hidden strategy as to why an event happened. Businesses and organizations can take benefit of various types of uses for machine learning: • Segmentation, sets of clients who have same or similar purchase patterns for objective marketing • Classification based on a set of attributes to make a prediction • Forecasts—When purchase projections based on time series • Pattern detection that associates one product with other one to reveal cross-sell sequences and opportunities. • Anomaly detection— fraud detecting (for illustration) Predictive analytics model methodology The most widely used Cross Industry Standard Process for Data Mining methodology is used to develop predictive analytical models. It includes 6 phases: 1. business understanding 2. data understanding 3. data preparation 4. model development using supervised 5. unsupervised learning 6. model evaluation and model deployment
  • 2. Business understanding The understanding of business phase involves understand and define the use case or business problem, the business target and the business query that require to be answered. It also include defining success criteria. Then the criterion project-related action require to be process. These tasks involve defining resource needs such as defining any constraints, technology, people, money, creating a project plan, requirements, assessing risks and creating a contingency plan. Data understanding The understanding of data phase includes data needs such as internal and external data sources, origin and data characteristics (feature and quality) including 3Vs data volumes, variety, velocity, formats and so on, also whether the data is in a relational database, flat files, a Hadoop Distributed File System (HDFS) or if it is live, streaming data. This phase also includes data exploration and investigation using statistical analysis to look at hug data, In addition, a data quality assessment includes understanding the degree to which data is missing, has errors, is duplicated, and is inconsistent. Data preparation The objective of the data preparation phase is to produce a set of information that can be fed into machine-learning algos. This process requires a number of tasks including filtering and cleaning; data conversion; data transformation; data enrichment; and variable identification, which is also known as dimensionality reduction or feature selection. Variable identification’s objective is to create a data set of the most relevant variables to be used as model input to get optimal results. The intention is also to remove variables from a data set that are not useful as model input without compromising the model’s accuracy—for illustration, the accuracy of the predictions it makes. Model development The model development phase is about the development of a machine-learning model. Models can be build up to predict, forecast or analyze information to find patterns such as sets, groups and associations Two types of machine learning can be used in model development: 1. supervised learning 2. unsupervised learning Typically, predictive models are build up using supervised learning. For illustration, if we require to develop a model for equipment failure prediction, we can use data that describes equipment that has actually failed. We can use that data to train the new model to distinguish the profile of a piece of equipment that is colorable going to fail. To fulfill this profile recognition, we divide the data segments which inclusive failed equipment data records into a test data set and a training data set. Then we train the model by fill the training data set and segments into an algorithm, various of which can be used for prediction. Then we test the model by test data set. Unsupervised learning is a method of analyzing data to try and search masked patterns in the data that indicate product association and groupings—for illustration, customer segmentation. Grouping is based on minimizing or maximizing similarity. The K-indicates clustering algorithm is a most widely used algorithm for this approach. Predictive and descriptive analytical models can be build up using advanced Developed data mining tools, analytics clouds, data science interactive workbooks with procedural or declarative programming languages and automated model development tools.
  • 3. Model evaluation Afterward Model developed, the next phase is to evaluate the accuracy and purity of predictions. For predictions, this assessment means understanding how many predictions were correct and incorrect? Various process can achieve this evaluation. Key measures in model evaluation are the number of true positives, true negatives, false positives and false negatives. The surface line is that we need to make surely that the model is accurate; otherwise, it could generate hug false positives that may result in incorrect actions and decisions. Model deployment Once we are happy with the model we’ve developed, the final phase involves deploying models to run in many various environment. These environments include spreadsheets, analytics servers, database management systems (DBMSs), applications, analytical relational database management systems, Apache Hadoop, Apache Spark and streaming analytics platforms.