SlideShare uma empresa Scribd logo
1 de 20
Data Mining
Introduction
intro
Data mining is a powerful new
technology with great potential to help
companies focus on the most important
information in the data they have
collected about the behavior of their
customers and potential customers.
Data collections in the real world






Ten largest transaction-processing
databases range from 3 to 18
Terabytes
Ten largest decision support databases
range from 10 to 29 Terabytes
Sizes have doubled / tripled between
2001 and end of 2003
Questions arise






Is there any new, unexpected and
potentially useful information contained
in this data?
Can we use historical data to predict
future outcomes?
(e.g. customer behavior, fraud
detection, etc.)
Some examples of data mining
1.

Telecommunications

Huge amount of data is collected daily
 Transactional data (about each phone call)
 Data on mobile phones, house based phones, Internet, etc.)
 Other customer data (billing, personal information, etc.)
 Additional data (network load, faults, etc.)
Questions arises
 Which customer group is highly profitable, which one is not?
 To which customers should we advertise what kind of special
offers?
 What kind of call rates would increase profit without loosing good
customers?
 How do customer profiles change over time?
 Fraud detection (stolen mobile phones or phone cards

Another
2. Health
 Different aspects of the health system
 Personal health records (at GPs, specialists, etc.)
 Hospital data (e.g. admission data, midwives data,
surgery data)
 Billing information (Medicare, PBS)
Questions
 Are doctors following the procedures (e.g. prescription of
medication)?
 Adverse drug reactions (analysis of different data
collections to find correlations)
 Are people committing fraud (e.g. doctor shoppers)
 Correlations between social and environmental issues
and people's health?
What is data mining?


Data Mining is the automated extraction
of previously unrealized information
from Large data sources for the
purpose of supporting business actions.
Some more definitions






Knowledge discovery in databases is the
non-trivial process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.
An information extraction activity whose goal
is to discover hidden facts contained in
databases.
Data mining, or knowledge discovery, is the
computer-assisted process of digging through
and analyzing enormous sets of data and
then extracting the meaning of the data.
Data mining process
Data mining process






Extract, transform, and load transaction
data onto the data warehouse system.
Store and manage the data in a
multidimensional database system.
Provide data access to business
analysts and information technology
professionals.
Data mining process




Analyze the data by application
software.
Present the data in a useful format,
such as a graph or table.
DM is multi disciplinary
What they do
Detect patterns in data: Rules, patterns,
classes, associations and functional
dependencies, outliers, data distributions,
clusters
How they do it



Search through data and pattern space,
non-parametric modelling, filtering,
aggregation
How well they do it
Errors and biases, over-fitting,
confounding effects, speed, scalability
Challenges in DM






Data size
 Size of data collections grows more than
linear, doubling every 18 months
 Scalable algorithms are needed
 Data complexity
Different types of data (free text, HTML, XML,
multimedia)
Dimensionality of the data increases (more
attributes)
Challenges contd..






The curse of dimensionality affects many
algorithms
(for example find nearest neighbors in high
dimensions)
Data quality
 Real world data is messy and dirty
(missing and out-of-date values,
typographical errors, different
coding/formats, etc.)
Why mine data?







Data is being recorded
Recorded data is being warehoused
Computing power is affordable
Competitive pressure is strong
Commercial DM products are available
It provides support for business
decisions
Value to business






Market segmentation - Identify the
common characteristics of customers
who buy the same products from your
company.
Customer churn - Predict which
customers are likely to leave your
company and go to a competitor.
Fraud detection - Identify which
transactions are most likely to be
fraudulent.
Value to business




Interactive marketing - Predict what each
individual accessing a Web site is most
likely interested in seeing.
Market basket analysis - Understand what
products or services are commonly
purchased together; e.g., beer and
diapers.
Value to business






Trend analysis - Reveal the difference
between a typical customer this month
and last.
Data mining can also effectively deal with
missing, inconsistent, and noisy data.
Direct marketing - Identify which prospects
should be included in a mailing list to
obtain the highest response rate.

Mais conteúdo relacionado

Mais procurados

Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining8trackweb
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data miningpriya jain
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data miningcegonsoft1999
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining IntroductionVijayasankariS
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in TelecommunicationsMohsin Nadaf
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & AnalyticsPrasad Chitta
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMiguel Ángel Gómez
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorialgrinu
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
BIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALABIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALASaikiran Panjala
 

Mais procurados (20)

Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big Data
 
Data mining
Data miningData mining
Data mining
 
Big data
Big dataBig data
Big data
 
Data mining
Data miningData mining
Data mining
 
Sample
Sample Sample
Sample
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
BIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALABIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALA
 

Destaque (20)

Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Featured Speakers and Chefs
Featured Speakers and ChefsFeatured Speakers and Chefs
Featured Speakers and Chefs
 
Aγωγη του καταναλωτη
Aγωγη του καταναλωτηAγωγη του καταναλωτη
Aγωγη του καταναλωτη
 
2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com
 
Gamification
GamificationGamification
Gamification
 
شهادات تاهلية1
شهادات تاهلية1شهادات تاهلية1
شهادات تاهلية1
 
Avalanche Survival Infographic
Avalanche Survival InfographicAvalanche Survival Infographic
Avalanche Survival Infographic
 
Selection of Human Resource
Selection of Human ResourceSelection of Human Resource
Selection of Human Resource
 
May loc nuoc home pure
May loc nuoc home pureMay loc nuoc home pure
May loc nuoc home pure
 
National parks
National parks National parks
National parks
 
Mapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, EmbracingMapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, Embracing
 
Langdon Liz artwork portfolio
Langdon Liz  artwork portfolioLangdon Liz  artwork portfolio
Langdon Liz artwork portfolio
 
Progettare Media Education nelle scuole
Progettare Media Education nelle scuoleProgettare Media Education nelle scuole
Progettare Media Education nelle scuole
 
Bejeweled
BejeweledBejeweled
Bejeweled
 
e-learning
e-learninge-learning
e-learning
 
Aboutmydaughter
AboutmydaughterAboutmydaughter
Aboutmydaughter
 
Teori Belajar Brunner
Teori Belajar BrunnerTeori Belajar Brunner
Teori Belajar Brunner
 
Question 2
Question 2Question 2
Question 2
 
English Comp 1 Who am I?
English Comp 1 Who am I?English Comp 1 Who am I?
English Comp 1 Who am I?
 
Presentation2
Presentation2Presentation2
Presentation2
 

Semelhante a Data mining introduction

Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.collRam Sonawane
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchKelly Page
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceUyoyo Edosio
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Datakisti purwitosari
 

Semelhante a Data mining introduction (20)

Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing Research
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Data
 
Big Data Ethics
Big Data EthicsBig Data Ethics
Big Data Ethics
 

Mais de Niyitegekabilly

Mais de Niyitegekabilly (7)

Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
 
JAVA PROGRAMMINGD
JAVA PROGRAMMINGDJAVA PROGRAMMINGD
JAVA PROGRAMMINGD
 
Birasa 1
Birasa 1Birasa 1
Birasa 1
 
JAVA PROGRAMMING
JAVA PROGRAMMING JAVA PROGRAMMING
JAVA PROGRAMMING
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Data mining introduction

  • 2. intro Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers.
  • 3. Data collections in the real world    Ten largest transaction-processing databases range from 3 to 18 Terabytes Ten largest decision support databases range from 10 to 29 Terabytes Sizes have doubled / tripled between 2001 and end of 2003
  • 4. Questions arise    Is there any new, unexpected and potentially useful information contained in this data? Can we use historical data to predict future outcomes? (e.g. customer behavior, fraud detection, etc.)
  • 5. Some examples of data mining 1. Telecommunications Huge amount of data is collected daily  Transactional data (about each phone call)  Data on mobile phones, house based phones, Internet, etc.)  Other customer data (billing, personal information, etc.)  Additional data (network load, faults, etc.) Questions arises  Which customer group is highly profitable, which one is not?  To which customers should we advertise what kind of special offers?  What kind of call rates would increase profit without loosing good customers?  How do customer profiles change over time?  Fraud detection (stolen mobile phones or phone cards 
  • 6. Another 2. Health  Different aspects of the health system  Personal health records (at GPs, specialists, etc.)  Hospital data (e.g. admission data, midwives data, surgery data)  Billing information (Medicare, PBS) Questions  Are doctors following the procedures (e.g. prescription of medication)?  Adverse drug reactions (analysis of different data collections to find correlations)  Are people committing fraud (e.g. doctor shoppers)  Correlations between social and environmental issues and people's health?
  • 7. What is data mining?  Data Mining is the automated extraction of previously unrealized information from Large data sources for the purpose of supporting business actions.
  • 8. Some more definitions    Knowledge discovery in databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. An information extraction activity whose goal is to discover hidden facts contained in databases. Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data.
  • 10. Data mining process    Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals.
  • 11. Data mining process   Analyze the data by application software. Present the data in a useful format, such as a graph or table.
  • 12. DM is multi disciplinary
  • 13. What they do Detect patterns in data: Rules, patterns, classes, associations and functional dependencies, outliers, data distributions, clusters
  • 14. How they do it  Search through data and pattern space, non-parametric modelling, filtering, aggregation How well they do it Errors and biases, over-fitting, confounding effects, speed, scalability
  • 15. Challenges in DM    Data size  Size of data collections grows more than linear, doubling every 18 months  Scalable algorithms are needed  Data complexity Different types of data (free text, HTML, XML, multimedia) Dimensionality of the data increases (more attributes)
  • 16. Challenges contd..    The curse of dimensionality affects many algorithms (for example find nearest neighbors in high dimensions) Data quality  Real world data is messy and dirty (missing and out-of-date values, typographical errors, different coding/formats, etc.)
  • 17. Why mine data?       Data is being recorded Recorded data is being warehoused Computing power is affordable Competitive pressure is strong Commercial DM products are available It provides support for business decisions
  • 18. Value to business    Market segmentation - Identify the common characteristics of customers who buy the same products from your company. Customer churn - Predict which customers are likely to leave your company and go to a competitor. Fraud detection - Identify which transactions are most likely to be fraudulent.
  • 19. Value to business   Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing. Market basket analysis - Understand what products or services are commonly purchased together; e.g., beer and diapers.
  • 20. Value to business    Trend analysis - Reveal the difference between a typical customer this month and last. Data mining can also effectively deal with missing, inconsistent, and noisy data. Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate.