SlideShare uma empresa Scribd logo
1 de 15
Mining Stream, Time Series, and Sequence Data
Methodologies for Stream Data Processing and Stream Data Systems Random Sampling Sliding Windows Histograms Multi resolution Methods Sketches Synopses
Randomized Algorithms to analyze Data Streams Randomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
Data Stream Management Systems and Stream Queries In traditional database systems, data are stored in finite and persistent databases. stream data are infinite and impossible to store fully in a database.  Data Stream Management System (DSMS), there may be multiple data streams. Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
Critical Layers of stream data cube     Two critical cuboids (or layers) The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to study The second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
Hoeffding Tree Algorithm The Hoeffding tree algorithm is a decision tree learning method for stream data classification. It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access.  It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners. It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
Very Fast Decision Tree (VFDT)  The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm. The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method. VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
Concept-adapting Very Fast Decision Tree algorithm (CVFDT). CVFDT also uses a sliding window approach;  however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones.  Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
A Classifier Ensemble Approach to Stream Data Classification The idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream. Whenever a new chunk arrives, we build a new classifier from it.  The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment.  Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
Clustering in evolving data streams Compute and store summaries of past data Apply a divide-and-conquer strategy Incremental clustering of incoming data streams Perform micro clustering as well as macro clustering analysis Explore multiple time granularity for the analysis of cluster evolution Divide stream clustering into on-line and off-line processes
Mining Time-Series Data A time-series database consists of sequences of values or events obtained over repeated measurements of time. Trend Analysis Similarity Search in Time-Series Analysis
Markov Chain for sequence analysis A Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
Tasks using hidden Markov models include: Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model. Decoding: Given a sequence, determine the most probable path through the model that produced the sequence. Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
Different algorithms in series analysis Forward Algorithm Viterbi Algorithm Baum-Welch Algorithm
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Mais conteúdo relacionado

Mais procurados

Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
Bilkent University
 

Mais procurados (20)

Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Control Strategies in AI
Control Strategies in AIControl Strategies in AI
Control Strategies in AI
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Reasoning in AI
Reasoning in AIReasoning in AI
Reasoning in AI
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Foundations of Machine Learning
Foundations of Machine LearningFoundations of Machine Learning
Foundations of Machine Learning
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
 
Text mining
Text miningText mining
Text mining
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 

Semelhante a Data Mining: Mining stream time series and sequence data

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Mumbai Academisc
 

Semelhante a Data Mining: Mining stream time series and sequence data (20)

Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
 
Thilaganga mphil cs viva presentation ppt
Thilaganga mphil cs viva presentation pptThilaganga mphil cs viva presentation ppt
Thilaganga mphil cs viva presentation ppt
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Mining closed sequential patterns in large sequence databases
Mining closed sequential patterns in large sequence databasesMining closed sequential patterns in large sequence databases
Mining closed sequential patterns in large sequence databases
 
Data mining
Data mining Data mining
Data mining
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Atomreaktor
AtomreaktorAtomreaktor
Atomreaktor
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 

Mais de DataminingTools Inc

Mais de DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Data Mining: Mining stream time series and sequence data

  • 1. Mining Stream, Time Series, and Sequence Data
  • 2. Methodologies for Stream Data Processing and Stream Data Systems Random Sampling Sliding Windows Histograms Multi resolution Methods Sketches Synopses
  • 3. Randomized Algorithms to analyze Data Streams Randomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
  • 4. Data Stream Management Systems and Stream Queries In traditional database systems, data are stored in finite and persistent databases. stream data are infinite and impossible to store fully in a database.  Data Stream Management System (DSMS), there may be multiple data streams. Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
  • 5. Critical Layers of stream data cube Two critical cuboids (or layers) The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to study The second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
  • 6. Hoeffding Tree Algorithm The Hoeffding tree algorithm is a decision tree learning method for stream data classification. It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners. It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
  • 7. Very Fast Decision Tree (VFDT)  The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm. The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method. VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
  • 8. Concept-adapting Very Fast Decision Tree algorithm (CVFDT). CVFDT also uses a sliding window approach; however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones. Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
  • 9. A Classifier Ensemble Approach to Stream Data Classification The idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream. Whenever a new chunk arrives, we build a new classifier from it. The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment. Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
  • 10. Clustering in evolving data streams Compute and store summaries of past data Apply a divide-and-conquer strategy Incremental clustering of incoming data streams Perform micro clustering as well as macro clustering analysis Explore multiple time granularity for the analysis of cluster evolution Divide stream clustering into on-line and off-line processes
  • 11. Mining Time-Series Data A time-series database consists of sequences of values or events obtained over repeated measurements of time. Trend Analysis Similarity Search in Time-Series Analysis
  • 12. Markov Chain for sequence analysis A Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
  • 13. Tasks using hidden Markov models include: Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model. Decoding: Given a sequence, determine the most probable path through the model that produced the sequence. Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
  • 14. Different algorithms in series analysis Forward Algorithm Viterbi Algorithm Baum-Welch Algorithm
  • 15. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net