SlideShare uma empresa Scribd logo
1 de 23
Microsoft Sequence ClusteringAnd Association Rules
OVERVIEW Introduction DMX Queries Interpreting the sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
Microsoft Sequence ClusteringAnd Association Rules The Microsoft Sequence Clustering algorithm is a sequence analysis algorithm provided by Microsoft SQL Server Analysis Services. The algorithm finds the most common sequences by grouping, or clustering, sequences that are identical. Ex :  Data that describes the click paths that are created when users navigate or browse a Web site. Data that describes the order in which a customer adds items to a shopping cart at an online retailer.
DMX Queries By querying the data mining schema rowset, you can find various kinds of information about the model such as: Basic metadata,  The date and time that the model was created and last processed,  The name of the mining structure that the model is based on,  The column used as the predictable attribute.
DMX Queries SELECT MINING_PARAMETERS  from  $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'Sequence Clustering'     Query to return the parameters that were used to build and train the Sample model.
DMX Queries SELECT FLATTENED NODE_UNIQUE_NAME, (SELECT ATTRIBUTE_VALUE AS [Product 1], [Support] AS [Sequence Support], [Probability] AS [Sequence Probability]     FROM NODE_DISTRIBUTION) AS t FROM [Sequence Clustering].CONTENT WHERE NODE_TYPE = 13 AND [PARENT_UNIQUE_NAME] = 0 Getting a List of Sequences for a State Query to return the complete list of first states in the model, before the sequences are grouped into clusters.  Returning the list of sequences (NODE_TYPE = 13) that have the model root node as parent (PARENT_UNIQUE_NAME = 0).  The FLATTENED keyword makes the results easier to read. Sample  result of this query is shown in the next figure.
DMX Queries you reference the value returned for NODE_UNIQUE_NAME  to get the ID of the node that contains all sequences for the model.  You pass this value to the query as the ID of the parent node, to get only the transitions included in this node, which happens to contain a list of al sequences for the model.
Interpreting the sequence clustering model A sequence clustering model has a single parent node that represents the model and its metadata.  The parent node, which is labeled, has a related sequence node that lists all the transitions that were detected in the training data. The algorithm also creates a number of clusters, based on the transitions that were found in the data and any other input attributes included when creating the model.  Each cluster contains its own sequence node that lists only the transitions that were used in generating that specific cluster.
Interpreting the sequence clustering model
Microsoft Sequence Clustering Algorithm Principles The Microsoft Sequence Clustering algorithm is a hybrid algorithm that combines clustering techniques with Markov chain analysis to identify clusters and their sequences. This data typically represents a series of events or transitions between states in a dataset.  The algorithm examines all transition probabilities and measures the differences, or distances, between all the possible sequences in the dataset to determine which sequences are the best to use as inputs for clustering.  After the algorithm has created the list of candidate sequences, it uses the sequence information as an input for the EM method of clustering.
Markov chain model A Markov chain also contains a matrix of transition probabilities.  The transitions emanating from a given state define a distribution over the possible next states.  The equation P (xi= G|xi-1=A) = 0.15 means that, given the current state A, the probability of the next state being G is 0.15.
Markov chain model Based on the Markov chain, for any given length L sequence x {x1, x2,x3,. . .,xL},  you can calculate the probability of a sequence as follows: P(x) = P(xL . xL-1,. . .,x1)         = P(xL| xL-1,. . .,x1)P (xL-1|xL-2,. . .,x1).. .P(x1) In first-order, the probability of each state xi depends only on the state of xi-1. P(x) = P(xL . xL-1,. . .,x1)        = P(xL|xL-1)P(xL-1|xL-2). . .P(x2|x1)P(x1)
Microsoft Sequence Clustering Parameters ,[object Object],Setting the CLUSTER_COUNT parameter to 0 causes the algorithm to use heuristics to best determine the number of clusters to build. The default is 10. ,[object Object],The default is 100.
Microsoft Sequence Clustering Parameters ,[object Object],The default is 10. ,[object Object],The default is 64.
Introduction to Microsoft Association Rules The Microsoft Association Rules Viewer in Microsoft SQL Server Analysis Services displays mining models that are built with the Microsoft Association algorithm. The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines.  A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest.  The Microsoft Association algorithm is also useful for market basket analysis.
Structure of an Association Model The top level has a single node (Model Root) that represents the model.  The second level contains nodes that represent qualified item sets and rules.
Association Algorithm Principles The Microsoft Association Rules algorithm belongs to the Apriori association family.  The two steps in the Microsoft Association Rules algorithm are: ,[object Object]
Generate association rules based on frequent item sets. ,[object Object]
Association Algorithm Parameters MINIMUM_PROBABILITY is a threshold parameter.  It defines the minimum probability for an association rule.  Its value is within the range of 0 to 1.  The default value is 0.4. MINIMUM_IMPORTANCE is a threshold parameter for association rules.  Rules with importance less than Minimum_Importance are filtered out.
Association Algorithm Parameters MAXIMUM_ITEMSET_SIZE specifies the maximum size of an itemset.  The default value is 0, which means that there is no size limit on the itemset. MINIMUM_ITEMSET_SIZE specifies the minimum size of the itemset.  The default value is 0. MAXIMUM_ITEMSET_COUNTdefines the maximum number of item sets.
Association Algorithm Parameters OPTIMIZED_PREDICTION_COUNTdefines the number of items to be cached to optimized predictions AUTODETECT_MINIMUM_SUPPORTrepresents the sensitivity of the algorithm used to autodetect minimum support. To automatically detect the smallest appropriate value of minimum support, Set this value to 1.0 . To turns off autodetection, Set this value to 1.0
Summary Introduction to sequence clustering DMX Queries The sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Mais conteúdo relacionado

Mais procurados (6)

XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysis
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
WEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationWEKA: Output Knowledge Representation
WEKA: Output Knowledge Representation
 
[M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization [M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization
 

Destaque

Destaque (15)

MS Sql Server: Datamining Introduction
MS Sql Server: Datamining IntroductionMS Sql Server: Datamining Introduction
MS Sql Server: Datamining Introduction
 
MS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining toolsMS SQL SERVER: Using the data mining tools
MS SQL SERVER: Using the data mining tools
 
MS SQLSERVER:Feeding Data Into Database
MS SQLSERVER:Feeding Data Into DatabaseMS SQLSERVER:Feeding Data Into Database
MS SQLSERVER:Feeding Data Into Database
 
MS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionMS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regression
 
MS SQLSERVER:Doing Calculations With Functions
MS SQLSERVER:Doing Calculations With FunctionsMS SQLSERVER:Doing Calculations With Functions
MS SQLSERVER:Doing Calculations With Functions
 
MS Sql Server: Reporting introduction
MS Sql Server: Reporting introductionMS Sql Server: Reporting introduction
MS Sql Server: Reporting introduction
 
MS Sql Server: Reporting basics
MS Sql  Server: Reporting basicsMS Sql  Server: Reporting basics
MS Sql Server: Reporting basics
 
MS SQLSERVER:Manipulating Database
MS SQLSERVER:Manipulating DatabaseMS SQLSERVER:Manipulating Database
MS SQLSERVER:Manipulating Database
 
MS SQL SERVER: Introduction To Database Concepts
MS SQL SERVER: Introduction To Database ConceptsMS SQL SERVER: Introduction To Database Concepts
MS SQL SERVER: Introduction To Database Concepts
 
MS SQLSERVER:Retrieving Data From A Database
MS SQLSERVER:Retrieving Data From A DatabaseMS SQLSERVER:Retrieving Data From A Database
MS SQLSERVER:Retrieving Data From A Database
 
MS Sql Server: Business Intelligence
MS Sql Server: Business IntelligenceMS Sql Server: Business Intelligence
MS Sql Server: Business Intelligence
 
MS SQL SERVER: Creating A Database
MS SQL SERVER: Creating A DatabaseMS SQL SERVER: Creating A Database
MS SQL SERVER: Creating A Database
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
MS SQLSERVER:Joining Databases
MS SQLSERVER:Joining DatabasesMS SQLSERVER:Joining Databases
MS SQLSERVER:Joining Databases
 
MS SQL SERVER: Getting Started With Sql Server 2008
MS SQL SERVER: Getting Started With Sql Server 2008MS SQL SERVER: Getting Started With Sql Server 2008
MS SQL SERVER: Getting Started With Sql Server 2008
 

Semelhante a MS SQL SERVER: Microsoft sequence clustering and association rules

MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithmMS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithm
sqlserver content
 
Predictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matchingPredictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matching
Horia Berca
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Erik De Monte
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
avniS
 

Semelhante a MS SQL SERVER: Microsoft sequence clustering and association rules (20)

MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithmMS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithm
 
Database programming
Database programmingDatabase programming
Database programming
 
MS SQL Server: Data mining concepts and dmx
MS SQL Server: Data mining concepts and dmxMS SQL Server: Data mining concepts and dmx
MS SQL Server: Data mining concepts and dmx
 
MS SQL SERVER: Data mining concepts and dmx
MS SQL SERVER: Data mining concepts and dmxMS SQL SERVER: Data mining concepts and dmx
MS SQL SERVER: Data mining concepts and dmx
 
Php and MySQL Web Development
Php and MySQL Web DevelopmentPhp and MySQL Web Development
Php and MySQL Web Development
 
mc_simulation documentation
mc_simulation documentationmc_simulation documentation
mc_simulation documentation
 
Interface Python with MySQL connectivity.pptx
Interface Python with MySQL connectivity.pptxInterface Python with MySQL connectivity.pptx
Interface Python with MySQL connectivity.pptx
 
MS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmMS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithm
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmMS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithm
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Predictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matchingPredictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matching
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 

Mais de sqlserver content

Mais de sqlserver content (15)

MS SQL SERVER: Programming sql server data mining
MS SQL SERVER:  Programming sql server data miningMS SQL SERVER:  Programming sql server data mining
MS SQL SERVER: Programming sql server data mining
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER:  Olap cubes and data miningMS SQL SERVER:  Olap cubes and data mining
MS SQL SERVER: Olap cubes and data mining
 
MS SQL SERVER: Decision trees algorithm
MS SQL SERVER: Decision trees algorithmMS SQL SERVER: Decision trees algorithm
MS SQL SERVER: Decision trees algorithm
 
MS Sql Server: Reporting models
MS Sql Server: Reporting modelsMS Sql Server: Reporting models
MS Sql Server: Reporting models
 
MS Sql Server: Reporting manipulating data
MS Sql Server: Reporting manipulating dataMS Sql Server: Reporting manipulating data
MS Sql Server: Reporting manipulating data
 
MS SQLSERVER:Deleting A Database
MS SQLSERVER:Deleting A DatabaseMS SQLSERVER:Deleting A Database
MS SQLSERVER:Deleting A Database
 
MS SQLSERVER:Customizing Your D Base Design
MS SQLSERVER:Customizing Your D Base DesignMS SQLSERVER:Customizing Your D Base Design
MS SQLSERVER:Customizing Your D Base Design
 
MS SQLSERVER:Creating Views
MS SQLSERVER:Creating ViewsMS SQLSERVER:Creating Views
MS SQLSERVER:Creating Views
 
MS SQLSERVER:Creating A Database
MS SQLSERVER:Creating A DatabaseMS SQLSERVER:Creating A Database
MS SQLSERVER:Creating A Database
 
MS SQLSERVER:Advanced Query Concepts Copy
MS SQLSERVER:Advanced Query Concepts   CopyMS SQLSERVER:Advanced Query Concepts   Copy
MS SQLSERVER:Advanced Query Concepts Copy
 
MS SQLSERVER:Sql Functions And Procedures
MS SQLSERVER:Sql Functions And ProceduresMS SQLSERVER:Sql Functions And Procedures
MS SQLSERVER:Sql Functions And Procedures
 
MS SQL SERVER: Sql Functions And Procedures
MS SQL SERVER: Sql Functions And ProceduresMS SQL SERVER: Sql Functions And Procedures
MS SQL SERVER: Sql Functions And Procedures
 
MS SQL SERVER: Retrieving Data From A Database
MS SQL SERVER: Retrieving Data From A DatabaseMS SQL SERVER: Retrieving Data From A Database
MS SQL SERVER: Retrieving Data From A Database
 
MS SQL SERVER: Manipulating Database
MS SQL SERVER: Manipulating DatabaseMS SQL SERVER: Manipulating Database
MS SQL SERVER: Manipulating Database
 
MS SQL SERVER: Joining Databases
MS SQL SERVER: Joining DatabasesMS SQL SERVER: Joining Databases
MS SQL SERVER: Joining Databases
 

Último

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

MS SQL SERVER: Microsoft sequence clustering and association rules

  • 2. OVERVIEW Introduction DMX Queries Interpreting the sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
  • 3. Microsoft Sequence ClusteringAnd Association Rules The Microsoft Sequence Clustering algorithm is a sequence analysis algorithm provided by Microsoft SQL Server Analysis Services. The algorithm finds the most common sequences by grouping, or clustering, sequences that are identical. Ex : Data that describes the click paths that are created when users navigate or browse a Web site. Data that describes the order in which a customer adds items to a shopping cart at an online retailer.
  • 4. DMX Queries By querying the data mining schema rowset, you can find various kinds of information about the model such as: Basic metadata, The date and time that the model was created and last processed, The name of the mining structure that the model is based on, The column used as the predictable attribute.
  • 5. DMX Queries SELECT MINING_PARAMETERS from $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'Sequence Clustering' Query to return the parameters that were used to build and train the Sample model.
  • 6. DMX Queries SELECT FLATTENED NODE_UNIQUE_NAME, (SELECT ATTRIBUTE_VALUE AS [Product 1], [Support] AS [Sequence Support], [Probability] AS [Sequence Probability] FROM NODE_DISTRIBUTION) AS t FROM [Sequence Clustering].CONTENT WHERE NODE_TYPE = 13 AND [PARENT_UNIQUE_NAME] = 0 Getting a List of Sequences for a State Query to return the complete list of first states in the model, before the sequences are grouped into clusters. Returning the list of sequences (NODE_TYPE = 13) that have the model root node as parent (PARENT_UNIQUE_NAME = 0). The FLATTENED keyword makes the results easier to read. Sample result of this query is shown in the next figure.
  • 7. DMX Queries you reference the value returned for NODE_UNIQUE_NAME to get the ID of the node that contains all sequences for the model. You pass this value to the query as the ID of the parent node, to get only the transitions included in this node, which happens to contain a list of al sequences for the model.
  • 8. Interpreting the sequence clustering model A sequence clustering model has a single parent node that represents the model and its metadata. The parent node, which is labeled, has a related sequence node that lists all the transitions that were detected in the training data. The algorithm also creates a number of clusters, based on the transitions that were found in the data and any other input attributes included when creating the model. Each cluster contains its own sequence node that lists only the transitions that were used in generating that specific cluster.
  • 9. Interpreting the sequence clustering model
  • 10. Microsoft Sequence Clustering Algorithm Principles The Microsoft Sequence Clustering algorithm is a hybrid algorithm that combines clustering techniques with Markov chain analysis to identify clusters and their sequences. This data typically represents a series of events or transitions between states in a dataset. The algorithm examines all transition probabilities and measures the differences, or distances, between all the possible sequences in the dataset to determine which sequences are the best to use as inputs for clustering. After the algorithm has created the list of candidate sequences, it uses the sequence information as an input for the EM method of clustering.
  • 11. Markov chain model A Markov chain also contains a matrix of transition probabilities. The transitions emanating from a given state define a distribution over the possible next states. The equation P (xi= G|xi-1=A) = 0.15 means that, given the current state A, the probability of the next state being G is 0.15.
  • 12. Markov chain model Based on the Markov chain, for any given length L sequence x {x1, x2,x3,. . .,xL}, you can calculate the probability of a sequence as follows: P(x) = P(xL . xL-1,. . .,x1) = P(xL| xL-1,. . .,x1)P (xL-1|xL-2,. . .,x1).. .P(x1) In first-order, the probability of each state xi depends only on the state of xi-1. P(x) = P(xL . xL-1,. . .,x1) = P(xL|xL-1)P(xL-1|xL-2). . .P(x2|x1)P(x1)
  • 13.
  • 14.
  • 15. Introduction to Microsoft Association Rules The Microsoft Association Rules Viewer in Microsoft SQL Server Analysis Services displays mining models that are built with the Microsoft Association algorithm. The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The Microsoft Association algorithm is also useful for market basket analysis.
  • 16. Structure of an Association Model The top level has a single node (Model Root) that represents the model. The second level contains nodes that represent qualified item sets and rules.
  • 17.
  • 18.
  • 19. Association Algorithm Parameters MINIMUM_PROBABILITY is a threshold parameter. It defines the minimum probability for an association rule. Its value is within the range of 0 to 1. The default value is 0.4. MINIMUM_IMPORTANCE is a threshold parameter for association rules. Rules with importance less than Minimum_Importance are filtered out.
  • 20. Association Algorithm Parameters MAXIMUM_ITEMSET_SIZE specifies the maximum size of an itemset. The default value is 0, which means that there is no size limit on the itemset. MINIMUM_ITEMSET_SIZE specifies the minimum size of the itemset. The default value is 0. MAXIMUM_ITEMSET_COUNTdefines the maximum number of item sets.
  • 21. Association Algorithm Parameters OPTIMIZED_PREDICTION_COUNTdefines the number of items to be cached to optimized predictions AUTODETECT_MINIMUM_SUPPORTrepresents the sensitivity of the algorithm used to autodetect minimum support. To automatically detect the smallest appropriate value of minimum support, Set this value to 1.0 . To turns off autodetection, Set this value to 1.0
  • 22. Summary Introduction to sequence clustering DMX Queries The sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
  • 23. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net