SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Document
Classification using
DMX in Analysis
Services
Mark Tabladillo Ph.D.
http://marktab.net
September 18, 2010
SQL Saturday 46 -- Raleigh NC
#sqlsat46 #MarkTabNet




                                © 2010 Mark Tabladillo Ph.D.
                                    2
MarkTab & Text Mining




    © 2010 Mark Tabladillo Ph.D.
3
© 2010 Mark Tabladillo Ph.D.
4
Outline




                      © 2010 Mark Tabladillo Ph.D.
 Tools for
              Demos
Text Mining

                          5
Data Mining as a Service




    © 2010 Mark Tabladillo Ph.D.
6
Text Mining Product
Comparison from 2008




                                                                                                                         © 2010 Mark Tabladillo Ph.D.
                                                                                                                             7

Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
SQL Server Data Mining
Activity     How
Preprocess   T-SQL; Integration Services; Data Mining Add-In for Excel; .NET
             programming

Associate    Microsoft Association Rules (algorithm)




                                                                               © 2010 Mark Tabladillo Ph.D.
Cluster      Microsoft Clustering (algorithm)

Summarize    Integration Services (Term Extraction, Term Lookup)

Categorize   Integration Services

API          Includes DMX, XMLA, AMO, ADOMD.NET

                                                                                   8
APIs for Data Mining
 Acronym     Term                             Definition
 DMX         Data Mining Extensions           SQL-like queries
             (OLE DB for Data Mining)


 XMLA        Extensible Markup Language for   Client communication
             Analysis                         protocol




                                                                       © 2010 Mark Tabladillo Ph.D.
 AMO         Analysis Management Objects      .NET library to manage
                                              Analysis Services


 ADOMD.NET   ActiveX Data Objects             .NET Framework data
             (Multidimensional) for .NET      provider
                                                                           9
DMX Tasks
• Data Definition
  • Create, Alter, Drop – Mining Structure
  • Create, Drop – Mining Model
  • Export and Import Models
• Data Manipulation




                                                                    © 2010 Mark Tabladillo Ph.D.
  • Query Models, Content, Cases, Sample Cases, Dimension Content




                                                                    10
SQL Server Data Mining
Applications (User Interfaces)
User Interface                                    Activity
Excel (and PowerPivot for Excel)                    DMX

BIDS (Business Intelligence         Analysis Services Project; Integration
Development Studio)                 Services Project (T-SQL; DMX; XMLA)




                                                                             © 2010 Mark Tabladillo Ph.D.
SSMS (SQL Server Management                  T-SQL; DMX; XMLA
Studio)
PowerShell version 2.0                       T-SQL; DMX; XMLA
                                             AMO; ADOMD.NET
SharePoint                           (Requires Setup or Customization)

Your Name Here (Develop Your Own)                     ?
                                                                             11
Outline




                      © 2010 Mark Tabladillo Ph.D.
 Tools for
              Demos
Text Mining

                      12
Data: Presidential Addresses




                                                                                      © 2010 Mark Tabladillo Ph.D.
                                                                                      13

 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
Excel
• Use the 32-bit Excel add-in for Data Mining
  • Written for SQL Server 2008, ok for 2008 R2
  • Written for Office 2007, ok for 2010
• (Optional) Add the free PowerPivot add-in
  (http://powerpivot.com)




                                                  © 2010 Mark Tabladillo Ph.D.
                                                  14
Click to edit Master title style
                                              Datasets
                                                 &
                                              Models     Public Cloud or On-
                                                         Premise Private
                                                         Cloud




                                                                        SQL
                                                                        Server
 •   SQL Server    PowerPivot                                           Analysis
 •   Access       Data Sources                                          Services
 •   Oracle
 •   Teradata
 •   Sybase
 •   Informix
 •   DB2
 •   Data Feeds
 •   Text Files




                   ©2010 Predixion Software
BIDS
• The preferred application for production data mining
• Analysis Services Projects
  • Make Mining Structures and Models
  • Data Mining for OLAP Cubes
  • Excellent for Experimentation




                                                         © 2010 Mark Tabladillo Ph.D.
• Integration Services Projects
  • Term Extraction and Term Lookup Text Mining
  • Excellent for Production
• Reporting Services Projects
  • Similar to Crystal Reports

                                                         16
SSMS
• Production management and maintenance
• Scripts can become stored procedures
• T-SQL, DMX, MDX, XMLA




                                          © 2010 Mark Tabladillo Ph.D.
                                          17
PowerShell
• Object-oriented command prompt, now in version 2
• Provides complete access to AMO, ADOMD.NET and DMX




                                                       © 2010 Mark Tabladillo Ph.D.
                                                       18
Excel in Production
• Can create and manage permanent data mining models
• Can document data mining models
• Can do some preprocessing (ETL)




                                                       © 2010 Mark Tabladillo Ph.D.
                                                       19
BIDS in Production
• Can create a production workflow with Integration Services
  projects
• Can create production data mining models with Analysis
  Services projects




                                                               © 2010 Mark Tabladillo Ph.D.
                                                               20
SSMS in Production
• The standard production user interface for SQL Server
• Also the standard production user interface for Analysis
  Services Databases
• Built for
  •   Scripting (T-SQL, MDX, DMX, XMLA)




                                                             © 2010 Mark Tabladillo Ph.D.
  •   Security
  •   Assembly Registration (Analysis Services)
  •   Stored Procedures (SQL Server)




                                                             21
PowerShell in Production
• Features
  • Object-oriented
  • Command window or ISE (Integrated Scripting Environment)
  • Accesses .NET libraries and WMI (Windows Management
    Instrumentation)




                                                               © 2010 Mark Tabladillo Ph.D.
  • Version two adds event and exception handling




                                                               22
Resources
• MarkTab.NET
  Blog, links, video resources and information for
  data mining
• Blog: http://marktab.net/datamining




                                                     © 2010 Mark Tabladillo Ph.D.
• Twitter: @MarkTabNet




                                                     23
Regroup and Conclusion
• Main Points from this Presentation




                                       © 2010 Mark Tabladillo Ph.D.
                                       24
Contact Information
• Mark Tabladillo
  http://marktab.net

• Also on:
  Twitter @marktabnet




                        © 2010 Mark Tabladillo Ph.D.
  Linked In




                        25

Mais conteúdo relacionado

Destaque

Destaque (7)

The One Time Methodology
The One Time MethodologyThe One Time Methodology
The One Time Methodology
 
Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010
 
Sas® Macro Design Patterns
Sas® Macro Design PatternsSas® Macro Design Patterns
Sas® Macro Design Patterns
 
Introduction to SAS System Where Expressions
Introduction to SAS System Where ExpressionsIntroduction to SAS System Where Expressions
Introduction to SAS System Where Expressions
 
Regular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise GuideRegular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise Guide
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 

Semelhante a Document Classification using DMX in SQL Server Analysis Services

Data mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotData mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivot
igsc
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
mkeating1
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
EclipseDayParis
 
Introduction to .net and asp
Introduction to .net and aspIntroduction to .net and asp
Introduction to .net and asp
Prachi Agarwal
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
elliando dias
 
Altova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLAltova Tools for DB2 pureXML
Altova Tools for DB2 pureXML
davemcg
 

Semelhante a Document Classification using DMX in SQL Server Analysis Services (20)

Data mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotData mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivot
 
Data Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivotData Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivot
 
SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsThe Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
 
An overview of Microsoft data mining technology
An overview of Microsoft data mining technologyAn overview of Microsoft data mining technology
An overview of Microsoft data mining technology
 
An overview of microsoft data mining technology
An overview of microsoft data mining technologyAn overview of microsoft data mining technology
An overview of microsoft data mining technology
 
Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
 
Enteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark TabladilloEnteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark Tabladillo
 
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
 
SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
 
Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining
 
Introduction to .net and asp
Introduction to .net and aspIntroduction to .net and asp
Introduction to .net and asp
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
 
Altova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLAltova Tools for DB2 pureXML
Altova Tools for DB2 pureXML
 
Leveraging PowerPivot
Leveraging PowerPivotLeveraging PowerPivot
Leveraging PowerPivot
 

Mais de Mark Tabladillo

Mais de Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 

Último

What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
srcw2322l101
 
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
Khaled Al Awadi
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
LR1709MUSIC
 
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTARPEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
doktercalysta
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
CaitlinCummins3
 
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
prakheeshc
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Dubai Multi Commodity Centre
 

Último (20)

What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
 
hyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statementshyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statements
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptxGoal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
 
MEANING AND CHARACTERISTICS OF TAXATION.
MEANING AND CHARACTERISTICS OF TAXATION.MEANING AND CHARACTERISTICS OF TAXATION.
MEANING AND CHARACTERISTICS OF TAXATION.
 
Global Internal Audit Standards 2024.pdf
Global Internal Audit Standards 2024.pdfGlobal Internal Audit Standards 2024.pdf
Global Internal Audit Standards 2024.pdf
 
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
 
stock price prediction using machine learning
stock price prediction using machine learningstock price prediction using machine learning
stock price prediction using machine learning
 
Unlocking Growth The Power of Outsourcing for CPA Firms
Unlocking Growth The Power of Outsourcing for CPA FirmsUnlocking Growth The Power of Outsourcing for CPA Firms
Unlocking Growth The Power of Outsourcing for CPA Firms
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
 
Toyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsToyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & Transformations
 
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
 
PitchBook’s Guide to VC Funding for Startups
PitchBook’s Guide to VC Funding for StartupsPitchBook’s Guide to VC Funding for Startups
PitchBook’s Guide to VC Funding for Startups
 
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTARPEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
 
How to refresh to be fit for the future world
How to refresh to be fit for the future worldHow to refresh to be fit for the future world
How to refresh to be fit for the future world
 
Stages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerStages of Startup Funding - An Explainer
Stages of Startup Funding - An Explainer
 
Exploring-Pipe-Flanges-Applications-Types-and-Benefits.pptx
Exploring-Pipe-Flanges-Applications-Types-and-Benefits.pptxExploring-Pipe-Flanges-Applications-Types-and-Benefits.pptx
Exploring-Pipe-Flanges-Applications-Types-and-Benefits.pptx
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
 
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
 

Document Classification using DMX in SQL Server Analysis Services

  • 1. Document Classification using DMX in Analysis Services Mark Tabladillo Ph.D. http://marktab.net September 18, 2010
  • 2. SQL Saturday 46 -- Raleigh NC #sqlsat46 #MarkTabNet © 2010 Mark Tabladillo Ph.D. 2
  • 3. MarkTab & Text Mining © 2010 Mark Tabladillo Ph.D. 3
  • 4. © 2010 Mark Tabladillo Ph.D. 4
  • 5. Outline © 2010 Mark Tabladillo Ph.D. Tools for Demos Text Mining 5
  • 6. Data Mining as a Service © 2010 Mark Tabladillo Ph.D. 6
  • 7. Text Mining Product Comparison from 2008 © 2010 Mark Tabladillo Ph.D. 7 Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
  • 8. SQL Server Data Mining Activity How Preprocess T-SQL; Integration Services; Data Mining Add-In for Excel; .NET programming Associate Microsoft Association Rules (algorithm) © 2010 Mark Tabladillo Ph.D. Cluster Microsoft Clustering (algorithm) Summarize Integration Services (Term Extraction, Term Lookup) Categorize Integration Services API Includes DMX, XMLA, AMO, ADOMD.NET 8
  • 9. APIs for Data Mining Acronym Term Definition DMX Data Mining Extensions SQL-like queries (OLE DB for Data Mining) XMLA Extensible Markup Language for Client communication Analysis protocol © 2010 Mark Tabladillo Ph.D. AMO Analysis Management Objects .NET library to manage Analysis Services ADOMD.NET ActiveX Data Objects .NET Framework data (Multidimensional) for .NET provider 9
  • 10. DMX Tasks • Data Definition • Create, Alter, Drop – Mining Structure • Create, Drop – Mining Model • Export and Import Models • Data Manipulation © 2010 Mark Tabladillo Ph.D. • Query Models, Content, Cases, Sample Cases, Dimension Content 10
  • 11. SQL Server Data Mining Applications (User Interfaces) User Interface Activity Excel (and PowerPivot for Excel) DMX BIDS (Business Intelligence Analysis Services Project; Integration Development Studio) Services Project (T-SQL; DMX; XMLA) © 2010 Mark Tabladillo Ph.D. SSMS (SQL Server Management T-SQL; DMX; XMLA Studio) PowerShell version 2.0 T-SQL; DMX; XMLA AMO; ADOMD.NET SharePoint (Requires Setup or Customization) Your Name Here (Develop Your Own) ? 11
  • 12. Outline © 2010 Mark Tabladillo Ph.D. Tools for Demos Text Mining 12
  • 13. Data: Presidential Addresses © 2010 Mark Tabladillo Ph.D. 13 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
  • 14. Excel • Use the 32-bit Excel add-in for Data Mining • Written for SQL Server 2008, ok for 2008 R2 • Written for Office 2007, ok for 2010 • (Optional) Add the free PowerPivot add-in (http://powerpivot.com) © 2010 Mark Tabladillo Ph.D. 14
  • 15. Click to edit Master title style Datasets & Models Public Cloud or On- Premise Private Cloud SQL Server • SQL Server PowerPivot Analysis • Access Data Sources Services • Oracle • Teradata • Sybase • Informix • DB2 • Data Feeds • Text Files ©2010 Predixion Software
  • 16. BIDS • The preferred application for production data mining • Analysis Services Projects • Make Mining Structures and Models • Data Mining for OLAP Cubes • Excellent for Experimentation © 2010 Mark Tabladillo Ph.D. • Integration Services Projects • Term Extraction and Term Lookup Text Mining • Excellent for Production • Reporting Services Projects • Similar to Crystal Reports 16
  • 17. SSMS • Production management and maintenance • Scripts can become stored procedures • T-SQL, DMX, MDX, XMLA © 2010 Mark Tabladillo Ph.D. 17
  • 18. PowerShell • Object-oriented command prompt, now in version 2 • Provides complete access to AMO, ADOMD.NET and DMX © 2010 Mark Tabladillo Ph.D. 18
  • 19. Excel in Production • Can create and manage permanent data mining models • Can document data mining models • Can do some preprocessing (ETL) © 2010 Mark Tabladillo Ph.D. 19
  • 20. BIDS in Production • Can create a production workflow with Integration Services projects • Can create production data mining models with Analysis Services projects © 2010 Mark Tabladillo Ph.D. 20
  • 21. SSMS in Production • The standard production user interface for SQL Server • Also the standard production user interface for Analysis Services Databases • Built for • Scripting (T-SQL, MDX, DMX, XMLA) © 2010 Mark Tabladillo Ph.D. • Security • Assembly Registration (Analysis Services) • Stored Procedures (SQL Server) 21
  • 22. PowerShell in Production • Features • Object-oriented • Command window or ISE (Integrated Scripting Environment) • Accesses .NET libraries and WMI (Windows Management Instrumentation) © 2010 Mark Tabladillo Ph.D. • Version two adds event and exception handling 22
  • 23. Resources • MarkTab.NET Blog, links, video resources and information for data mining • Blog: http://marktab.net/datamining © 2010 Mark Tabladillo Ph.D. • Twitter: @MarkTabNet 23
  • 24. Regroup and Conclusion • Main Points from this Presentation © 2010 Mark Tabladillo Ph.D. 24
  • 25. Contact Information • Mark Tabladillo http://marktab.net • Also on: Twitter @marktabnet © 2010 Mark Tabladillo Ph.D. Linked In 25