SlideShare uma empresa Scribd logo
1 de 22
Microsoft Naive Bayes Algorithm
overview Naive Bayes Algorithm DMX Queries Exploring a Naive Bayes Model Naive Bayes Principles Naive Bayes Parameters
Naive Bayes Algorithm The Microsoft Naive Bayes algorithm is a classification algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling.  The name Naive Bayes derives from the fact that the algorithm uses Bayes theorem but does not take into account dependencies that may exist, and therefore its assumptions are said to be naive.
How to use the Naive Bayes algorithm in SQL server? This algorithm is less computationally intense than other Microsoft algorithms It is therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns.  The algorithm considers each pair of input attribute values and output attribute values. Exploring a Naive Bayes model will tell you how your attributes are related to each other.
DMX  When you create a query against a data mining model you can create either a content query, which provides details about the patterns discovered in analysis, or you can create a prediction query, which uses the patterns in the model to make predictions for new data  You can also retrieve metadata about the model by using a query against the data mining schema rowset.
DMX Queries SELECT MODEL_CATALOG, MODEL_NAME, DATE_CREATED, LAST_PROCESSED, SERVICE_NAME, PREDICTION_ENTITY, FILTER  FROM  $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'TM_NaiveBayes_Filtered‘ Getting Model Metadata by Using DMX you can find metadata for the model, by querying the data mining schema rowset. This might include when the model was created, when the model was last processed, the name of the mining structure that the model is based on, and the name of the columns used as the predictable attribute.
DMX Queries Retrieving a Summary of Training Data Query to retrieve the data from the node specified. Because the statistics are stored in a nested table, the FLATTENED keyword is used to make the results easier to view.      SELECT FLATTENED MODEL_NAME,  (SELECT ATTRIBUTE_NAME,  ATTRIBUTE_VALUE, [SUPPORT],  [PROBABILITY], VALUETYPE  FROM  NODE_DISTRIBUTION) AS t FROM  TM_NaiveBayes.CONTENT WHERE  NODE_TYPE = 26
DMX Queries Finding More Information about Attributes Example to show how to return information from the model about a particular attribute( here ”Region”)  The Result of this query is shown in the next slide. SELECT NODE_TYPE, NODE_CAPTION, MSOLAP_NODE_SCORE FROM TM_NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'Region'
DMX Queries Sample Resultto showing information from the model about a particular  attribute  ”Region”
DMX Queries SELECT NODE_CAPTION, MSOLAP_NODE_SCORE  FROM  TM_NaiveBayes.CONTENT WHERE NODE_TYPE = 10 ORDER BY MSOLAP_NODE_SCORE DESC Query returns the importance scores of all attributes in the Model.  The Result of this query is shown in the next slide.
DMX Queries query returns the importance scores of all attributes in the Model.
Exploring a Naive Bayes Model The convenient way to start analyzing a new data set is to create a Naive Bayes model and mark all the non-key columns as both input and predictive. The content of each model is presented as a series of nodes.  A node is an object within a mining model that contains metadata and information about a portion of the model.  Nodes are arranged in a hierarchy. 
Naive Bayes Model Content
Exploring a Naive Bayes Model The Naive Bayes viewer is accessed through either the BI Development Studio or SQL Management Studio by right-clicking on the model and selecting Browse. SQL Server Data Mining provides four different views on Naive Bayes models : ,[object Object],Provides a quick display of how all of the attributes in your model are related.  Each node in the graph represents an attribute, whereas each edge represents a relationship.  outgoing edge (it is predictive of the attribute in the node at the end of the edge) Incoming edge( it is predicted by the other node)
Exploring a Naive Bayes Model ,[object Object],provides you with an exhaustive report of how each input attribute corresponds to each output attribute, one attribute at a time.  At the top of the Attribute Profiles view, you select which output you want to look at, and the rest of the view shows how all of the input attributes are correlated to the states of the selected output attribute.
Exploring a Naive Bayes Model ,[object Object],This tab allows you to select an output attribute and value and shows you a description of the cases where that attribute and value occur. ,[object Object], Provides the answers to the most interesting question:  What is the difference between X and Y?  With this viewer, you choose the attribute you are interested in, and select the states you want to compare.
Naive Bayes Principles Bayes mathematical methods use a combination of conditional and unconditional probabilities. The Naive part of Naive Bayes tells you to treat all of your input attributes as independent of each other with respect to the target variable.  This may be a faulty assumption, but it allows you to multiply your probabilities to determine the likelihood of each state.
Naive Bayes Principles The Bayes rule states that if you have a hypothesis Hand evidence about that hypothesis E, then the probability of H is calculated using the following formula: P(H | E) =   P(E | H) × P(H)                                    P(E) This simply states that the probability of your hypothesis given the evidence is equal to the probability of the evidence given the hypothesis multiplied by the probability of the hypothesis, and then normalized.
Naive Bayes Parameters MAXIMUM _INPUT _ATTRIBUTES determines the number of attributes that will be considered as inputs for training.  If there is more than this number of inputs, the algorithm will select the most important inputs and ignore the rest.  Setting this parameter to 0 causes the algorithm to consider all attributes. The default value is 255. MAXIMUM _OUTPUT _ATTRIBUTES determines the number of attributes that will be considered as outputs for training.  If there is more than this number of outputs, the algorithm will select the most important outputs and ignore the rest.  Setting this parameter to 0 causes the algorithm to consider all attributes. The default value is 255.
Naive Bayes Parameters MAXIMUM _STATES controls how many states of an attribute are considered. If an attribute has more than this number of states, only the most popular states will be used.  States that are not selected will be considered to be missing data. This parameter is useful when an attribute has a high cardinality
Summary Naive Bayes Algorithm DMX Queries Naive Bayes Model Content Exploring a Naive Bayes Model Naive Bayes Principles Naive Bayes Parameters
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Mais conteúdo relacionado

Mais procurados

Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
butest
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
butest
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
Sagar Kumar
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
butest
 

Mais procurados (17)

data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
MS SQL SERVER:Microsoft neural network and logistic regression
MS SQL SERVER:Microsoft neural network and logistic regressionMS SQL SERVER:Microsoft neural network and logistic regression
MS SQL SERVER:Microsoft neural network and logistic regression
 
A simple introduction to weka
A simple introduction to wekaA simple introduction to weka
A simple introduction to weka
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Oracle ML Cheat Sheet
Oracle ML Cheat SheetOracle ML Cheat Sheet
Oracle ML Cheat Sheet
 
Weka
WekaWeka
Weka
 
Weka
WekaWeka
Weka
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 

Destaque

HistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN IiHistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN Ii
lara
 
Direct-services portfolio
Direct-services portfolioDirect-services portfolio
Direct-services portfolio
vlastakolaja
 

Destaque (20)

LISP: Type specifiers in lisp
LISP: Type specifiers in lispLISP: Type specifiers in lisp
LISP: Type specifiers in lisp
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
 
LISP:Predicates in lisp
LISP:Predicates in lispLISP:Predicates in lisp
LISP:Predicates in lisp
 
Norihicodanch
NorihicodanchNorihicodanch
Norihicodanch
 
WEKA: Introduction To Weka
WEKA: Introduction To WekaWEKA: Introduction To Weka
WEKA: Introduction To Weka
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
 
Communicating simply
Communicating simplyCommunicating simply
Communicating simply
 
Txomin Hartz Txikia
Txomin Hartz TxikiaTxomin Hartz Txikia
Txomin Hartz Txikia
 
HistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN IiHistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN Ii
 
RapidMiner: Advanced Processes And Operators
RapidMiner:  Advanced Processes And OperatorsRapidMiner:  Advanced Processes And Operators
RapidMiner: Advanced Processes And Operators
 
Data Applied: Clustering
Data Applied: ClusteringData Applied: Clustering
Data Applied: Clustering
 
Drc 2010 D.J.Pawlik
Drc 2010 D.J.PawlikDrc 2010 D.J.Pawlik
Drc 2010 D.J.Pawlik
 
Oracle: DML
Oracle: DMLOracle: DML
Oracle: DML
 
Association Rules
Association RulesAssociation Rules
Association Rules
 
MS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningMS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data mining
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4
 
SQL Server: BI
SQL Server: BISQL Server: BI
SQL Server: BI
 
Direct-services portfolio
Direct-services portfolioDirect-services portfolio
Direct-services portfolio
 

Semelhante a MS SQL SERVER: Microsoft naive bayes algorithm

Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxTask A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
josies1
 
CPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semesteCPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semeste
CruzIbarra161
 
Obiee interview questions and answers faq
Obiee interview questions and answers faqObiee interview questions and answers faq
Obiee interview questions and answers faq
maheshboggula
 

Semelhante a MS SQL SERVER: Microsoft naive bayes algorithm (20)

MS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesMS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rules
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxTask A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Php and MySQL Web Development
Php and MySQL Web DevelopmentPhp and MySQL Web Development
Php and MySQL Web Development
 
8606BICA2.pptx
8606BICA2.pptx8606BICA2.pptx
8606BICA2.pptx
 
CPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semesteCPSC 50900 Database Systems ProjectAll your efforts this semeste
CPSC 50900 Database Systems ProjectAll your efforts this semeste
 
Generating test data for Statistical and ML models
Generating test data for Statistical and ML modelsGenerating test data for Statistical and ML models
Generating test data for Statistical and ML models
 
Excel Datamining Addin Intermediate
Excel Datamining Addin IntermediateExcel Datamining Addin Intermediate
Excel Datamining Addin Intermediate
 
Excel Datamining Addin Intermediate
Excel Datamining Addin IntermediateExcel Datamining Addin Intermediate
Excel Datamining Addin Intermediate
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Obiee interview questions and answers faq
Obiee interview questions and answers faqObiee interview questions and answers faq
Obiee interview questions and answers faq
 
VBA work.pdf
VBA work.pdfVBA work.pdf
VBA work.pdf
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008
 

Mais de DataminingTools Inc

Mais de DataminingTools Inc (20)

Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

MS SQL SERVER: Microsoft naive bayes algorithm

  • 2. overview Naive Bayes Algorithm DMX Queries Exploring a Naive Bayes Model Naive Bayes Principles Naive Bayes Parameters
  • 3. Naive Bayes Algorithm The Microsoft Naive Bayes algorithm is a classification algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling. The name Naive Bayes derives from the fact that the algorithm uses Bayes theorem but does not take into account dependencies that may exist, and therefore its assumptions are said to be naive.
  • 4. How to use the Naive Bayes algorithm in SQL server? This algorithm is less computationally intense than other Microsoft algorithms It is therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns. The algorithm considers each pair of input attribute values and output attribute values. Exploring a Naive Bayes model will tell you how your attributes are related to each other.
  • 5. DMX When you create a query against a data mining model you can create either a content query, which provides details about the patterns discovered in analysis, or you can create a prediction query, which uses the patterns in the model to make predictions for new data You can also retrieve metadata about the model by using a query against the data mining schema rowset.
  • 6. DMX Queries SELECT MODEL_CATALOG, MODEL_NAME, DATE_CREATED, LAST_PROCESSED, SERVICE_NAME, PREDICTION_ENTITY, FILTER FROM $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'TM_NaiveBayes_Filtered‘ Getting Model Metadata by Using DMX you can find metadata for the model, by querying the data mining schema rowset. This might include when the model was created, when the model was last processed, the name of the mining structure that the model is based on, and the name of the columns used as the predictable attribute.
  • 7. DMX Queries Retrieving a Summary of Training Data Query to retrieve the data from the node specified. Because the statistics are stored in a nested table, the FLATTENED keyword is used to make the results easier to view. SELECT FLATTENED MODEL_NAME, (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT], [PROBABILITY], VALUETYPE FROM NODE_DISTRIBUTION) AS t FROM TM_NaiveBayes.CONTENT WHERE NODE_TYPE = 26
  • 8. DMX Queries Finding More Information about Attributes Example to show how to return information from the model about a particular attribute( here ”Region”) The Result of this query is shown in the next slide. SELECT NODE_TYPE, NODE_CAPTION, MSOLAP_NODE_SCORE FROM TM_NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'Region'
  • 9. DMX Queries Sample Resultto showing information from the model about a particular attribute  ”Region”
  • 10. DMX Queries SELECT NODE_CAPTION, MSOLAP_NODE_SCORE FROM TM_NaiveBayes.CONTENT WHERE NODE_TYPE = 10 ORDER BY MSOLAP_NODE_SCORE DESC Query returns the importance scores of all attributes in the Model. The Result of this query is shown in the next slide.
  • 11. DMX Queries query returns the importance scores of all attributes in the Model.
  • 12. Exploring a Naive Bayes Model The convenient way to start analyzing a new data set is to create a Naive Bayes model and mark all the non-key columns as both input and predictive. The content of each model is presented as a series of nodes. A node is an object within a mining model that contains metadata and information about a portion of the model. Nodes are arranged in a hierarchy. 
  • 13. Naive Bayes Model Content
  • 14.
  • 15.
  • 16.
  • 17. Naive Bayes Principles Bayes mathematical methods use a combination of conditional and unconditional probabilities. The Naive part of Naive Bayes tells you to treat all of your input attributes as independent of each other with respect to the target variable. This may be a faulty assumption, but it allows you to multiply your probabilities to determine the likelihood of each state.
  • 18. Naive Bayes Principles The Bayes rule states that if you have a hypothesis Hand evidence about that hypothesis E, then the probability of H is calculated using the following formula: P(H | E) = P(E | H) × P(H) P(E) This simply states that the probability of your hypothesis given the evidence is equal to the probability of the evidence given the hypothesis multiplied by the probability of the hypothesis, and then normalized.
  • 19. Naive Bayes Parameters MAXIMUM _INPUT _ATTRIBUTES determines the number of attributes that will be considered as inputs for training. If there is more than this number of inputs, the algorithm will select the most important inputs and ignore the rest. Setting this parameter to 0 causes the algorithm to consider all attributes. The default value is 255. MAXIMUM _OUTPUT _ATTRIBUTES determines the number of attributes that will be considered as outputs for training. If there is more than this number of outputs, the algorithm will select the most important outputs and ignore the rest. Setting this parameter to 0 causes the algorithm to consider all attributes. The default value is 255.
  • 20. Naive Bayes Parameters MAXIMUM _STATES controls how many states of an attribute are considered. If an attribute has more than this number of states, only the most popular states will be used. States that are not selected will be considered to be missing data. This parameter is useful when an attribute has a high cardinality
  • 21. Summary Naive Bayes Algorithm DMX Queries Naive Bayes Model Content Exploring a Naive Bayes Model Naive Bayes Principles Naive Bayes Parameters
  • 22. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net