SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Document
Classification using
DMX in Analysis
Services
Mark Tabladillo Ph.D.
http://marktab.net
September 18, 2010
SQL Saturday 46 -- Raleigh NC
#sqlsat46 #MarkTabNet




                                © 2010 Mark Tabladillo Ph.D.
                                    2
MarkTab & Text Mining




    © 2010 Mark Tabladillo Ph.D.
3
© 2010 Mark Tabladillo Ph.D.
4
Outline




                      © 2010 Mark Tabladillo Ph.D.
 Tools for
              Demos
Text Mining

                          5
Data Mining as a Service




    © 2010 Mark Tabladillo Ph.D.
6
Text Mining Product
Comparison from 2008




                                                                                                                         © 2010 Mark Tabladillo Ph.D.
                                                                                                                             7

Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
SQL Server Data Mining
Activity     How
Preprocess   T-SQL; Integration Services; Data Mining Add-In for Excel; .NET
             programming

Associate    Microsoft Association Rules (algorithm)




                                                                               © 2010 Mark Tabladillo Ph.D.
Cluster      Microsoft Clustering (algorithm)

Summarize    Integration Services (Term Extraction, Term Lookup)

Categorize   Integration Services

API          Includes DMX, XMLA, AMO, ADOMD.NET

                                                                                   8
APIs for Data Mining
 Acronym     Term                             Definition
 DMX         Data Mining Extensions           SQL-like queries
             (OLE DB for Data Mining)


 XMLA        Extensible Markup Language for   Client communication
             Analysis                         protocol




                                                                       © 2010 Mark Tabladillo Ph.D.
 AMO         Analysis Management Objects      .NET library to manage
                                              Analysis Services


 ADOMD.NET   ActiveX Data Objects             .NET Framework data
             (Multidimensional) for .NET      provider
                                                                           9
DMX Tasks
• Data Definition
  • Create, Alter, Drop – Mining Structure
  • Create, Drop – Mining Model
  • Export and Import Models
• Data Manipulation




                                                                    © 2010 Mark Tabladillo Ph.D.
  • Query Models, Content, Cases, Sample Cases, Dimension Content




                                                                    10
SQL Server Data Mining
Applications (User Interfaces)
User Interface                                    Activity
Excel (and PowerPivot for Excel)                    DMX

BIDS (Business Intelligence         Analysis Services Project; Integration
Development Studio)                 Services Project (T-SQL; DMX; XMLA)




                                                                             © 2010 Mark Tabladillo Ph.D.
SSMS (SQL Server Management                  T-SQL; DMX; XMLA
Studio)
PowerShell version 2.0                       T-SQL; DMX; XMLA
                                             AMO; ADOMD.NET
SharePoint                           (Requires Setup or Customization)

Your Name Here (Develop Your Own)                     ?
                                                                             11
Outline




                      © 2010 Mark Tabladillo Ph.D.
 Tools for
              Demos
Text Mining

                      12
Data: Presidential Addresses




                                                                                      © 2010 Mark Tabladillo Ph.D.
                                                                                      13

 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
Excel
• Use the 32-bit Excel add-in for Data Mining
  • Written for SQL Server 2008, ok for 2008 R2
  • Written for Office 2007, ok for 2010
• (Optional) Add the free PowerPivot add-in
  (http://powerpivot.com)




                                                  © 2010 Mark Tabladillo Ph.D.
                                                  14
Click to edit Master title style
                                              Datasets
                                                 &
                                              Models     Public Cloud or On-
                                                         Premise Private
                                                         Cloud




                                                                        SQL
                                                                        Server
 •   SQL Server    PowerPivot                                           Analysis
 •   Access       Data Sources                                          Services
 •   Oracle
 •   Teradata
 •   Sybase
 •   Informix
 •   DB2
 •   Data Feeds
 •   Text Files




                   ©2010 Predixion Software
BIDS
• The preferred application for production data mining
• Analysis Services Projects
  • Make Mining Structures and Models
  • Data Mining for OLAP Cubes
  • Excellent for Experimentation




                                                         © 2010 Mark Tabladillo Ph.D.
• Integration Services Projects
  • Term Extraction and Term Lookup Text Mining
  • Excellent for Production
• Reporting Services Projects
  • Similar to Crystal Reports

                                                         16
SSMS
• Production management and maintenance
• Scripts can become stored procedures
• T-SQL, DMX, MDX, XMLA




                                          © 2010 Mark Tabladillo Ph.D.
                                          17
PowerShell
• Object-oriented command prompt, now in version 2
• Provides complete access to AMO, ADOMD.NET and DMX




                                                       © 2010 Mark Tabladillo Ph.D.
                                                       18
Excel in Production
• Can create and manage permanent data mining models
• Can document data mining models
• Can do some preprocessing (ETL)




                                                       © 2010 Mark Tabladillo Ph.D.
                                                       19
BIDS in Production
• Can create a production workflow with Integration Services
  projects
• Can create production data mining models with Analysis
  Services projects




                                                               © 2010 Mark Tabladillo Ph.D.
                                                               20
SSMS in Production
• The standard production user interface for SQL Server
• Also the standard production user interface for Analysis
  Services Databases
• Built for
  •   Scripting (T-SQL, MDX, DMX, XMLA)




                                                             © 2010 Mark Tabladillo Ph.D.
  •   Security
  •   Assembly Registration (Analysis Services)
  •   Stored Procedures (SQL Server)




                                                             21
PowerShell in Production
• Features
  • Object-oriented
  • Command window or ISE (Integrated Scripting Environment)
  • Accesses .NET libraries and WMI (Windows Management
    Instrumentation)




                                                               © 2010 Mark Tabladillo Ph.D.
  • Version two adds event and exception handling




                                                               22
Resources
• MarkTab.NET
  Blog, links, video resources and information for
  data mining
• Blog: http://marktab.net/datamining




                                                     © 2010 Mark Tabladillo Ph.D.
• Twitter: @MarkTabNet




                                                     23
Regroup and Conclusion
• Main Points from this Presentation




                                       © 2010 Mark Tabladillo Ph.D.
                                       24
Contact Information
• Mark Tabladillo
  http://marktab.net

• Also on:
  Twitter @marktabnet




                        © 2010 Mark Tabladillo Ph.D.
  Linked In




                        25

Mais conteúdo relacionado

Destaque

Destaque (7)

The One Time Methodology
The One Time MethodologyThe One Time Methodology
The One Time Methodology
 
Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010
 
Sas® Macro Design Patterns
Sas® Macro Design PatternsSas® Macro Design Patterns
Sas® Macro Design Patterns
 
Introduction to SAS System Where Expressions
Introduction to SAS System Where ExpressionsIntroduction to SAS System Where Expressions
Introduction to SAS System Where Expressions
 
Regular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise GuideRegular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise Guide
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 

Semelhante a Document Classification using DMX in SQL Server Analysis Services

Data mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotData mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivot
igsc
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
mkeating1
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
EclipseDayParis
 
Introduction to .net and asp
Introduction to .net and aspIntroduction to .net and asp
Introduction to .net and asp
Prachi Agarwal
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
elliando dias
 
Altova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLAltova Tools for DB2 pureXML
Altova Tools for DB2 pureXML
davemcg
 

Semelhante a Document Classification using DMX in SQL Server Analysis Services (20)

Data mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotData mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivot
 
Data Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivotData Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivot
 
SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsThe Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
 
An overview of Microsoft data mining technology
An overview of Microsoft data mining technologyAn overview of Microsoft data mining technology
An overview of Microsoft data mining technology
 
An overview of microsoft data mining technology
An overview of microsoft data mining technologyAn overview of microsoft data mining technology
An overview of microsoft data mining technology
 
Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
 
Enteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark TabladilloEnteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark Tabladillo
 
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
 
SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
 
Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining
 
Introduction to .net and asp
Introduction to .net and aspIntroduction to .net and asp
Introduction to .net and asp
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
 
Altova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLAltova Tools for DB2 pureXML
Altova Tools for DB2 pureXML
 
Leveraging PowerPivot
Leveraging PowerPivotLeveraging PowerPivot
Leveraging PowerPivot
 

Mais de Mark Tabladillo

Mais de Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 

Último

The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
ZurliaSoop
 

Último (20)

Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTSDurg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck Template
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 

Document Classification using DMX in SQL Server Analysis Services

  • 1. Document Classification using DMX in Analysis Services Mark Tabladillo Ph.D. http://marktab.net September 18, 2010
  • 2. SQL Saturday 46 -- Raleigh NC #sqlsat46 #MarkTabNet © 2010 Mark Tabladillo Ph.D. 2
  • 3. MarkTab & Text Mining © 2010 Mark Tabladillo Ph.D. 3
  • 4. © 2010 Mark Tabladillo Ph.D. 4
  • 5. Outline © 2010 Mark Tabladillo Ph.D. Tools for Demos Text Mining 5
  • 6. Data Mining as a Service © 2010 Mark Tabladillo Ph.D. 6
  • 7. Text Mining Product Comparison from 2008 © 2010 Mark Tabladillo Ph.D. 7 Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
  • 8. SQL Server Data Mining Activity How Preprocess T-SQL; Integration Services; Data Mining Add-In for Excel; .NET programming Associate Microsoft Association Rules (algorithm) © 2010 Mark Tabladillo Ph.D. Cluster Microsoft Clustering (algorithm) Summarize Integration Services (Term Extraction, Term Lookup) Categorize Integration Services API Includes DMX, XMLA, AMO, ADOMD.NET 8
  • 9. APIs for Data Mining Acronym Term Definition DMX Data Mining Extensions SQL-like queries (OLE DB for Data Mining) XMLA Extensible Markup Language for Client communication Analysis protocol © 2010 Mark Tabladillo Ph.D. AMO Analysis Management Objects .NET library to manage Analysis Services ADOMD.NET ActiveX Data Objects .NET Framework data (Multidimensional) for .NET provider 9
  • 10. DMX Tasks • Data Definition • Create, Alter, Drop – Mining Structure • Create, Drop – Mining Model • Export and Import Models • Data Manipulation © 2010 Mark Tabladillo Ph.D. • Query Models, Content, Cases, Sample Cases, Dimension Content 10
  • 11. SQL Server Data Mining Applications (User Interfaces) User Interface Activity Excel (and PowerPivot for Excel) DMX BIDS (Business Intelligence Analysis Services Project; Integration Development Studio) Services Project (T-SQL; DMX; XMLA) © 2010 Mark Tabladillo Ph.D. SSMS (SQL Server Management T-SQL; DMX; XMLA Studio) PowerShell version 2.0 T-SQL; DMX; XMLA AMO; ADOMD.NET SharePoint (Requires Setup or Customization) Your Name Here (Develop Your Own) ? 11
  • 12. Outline © 2010 Mark Tabladillo Ph.D. Tools for Demos Text Mining 12
  • 13. Data: Presidential Addresses © 2010 Mark Tabladillo Ph.D. 13 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
  • 14. Excel • Use the 32-bit Excel add-in for Data Mining • Written for SQL Server 2008, ok for 2008 R2 • Written for Office 2007, ok for 2010 • (Optional) Add the free PowerPivot add-in (http://powerpivot.com) © 2010 Mark Tabladillo Ph.D. 14
  • 15. Click to edit Master title style Datasets & Models Public Cloud or On- Premise Private Cloud SQL Server • SQL Server PowerPivot Analysis • Access Data Sources Services • Oracle • Teradata • Sybase • Informix • DB2 • Data Feeds • Text Files ©2010 Predixion Software
  • 16. BIDS • The preferred application for production data mining • Analysis Services Projects • Make Mining Structures and Models • Data Mining for OLAP Cubes • Excellent for Experimentation © 2010 Mark Tabladillo Ph.D. • Integration Services Projects • Term Extraction and Term Lookup Text Mining • Excellent for Production • Reporting Services Projects • Similar to Crystal Reports 16
  • 17. SSMS • Production management and maintenance • Scripts can become stored procedures • T-SQL, DMX, MDX, XMLA © 2010 Mark Tabladillo Ph.D. 17
  • 18. PowerShell • Object-oriented command prompt, now in version 2 • Provides complete access to AMO, ADOMD.NET and DMX © 2010 Mark Tabladillo Ph.D. 18
  • 19. Excel in Production • Can create and manage permanent data mining models • Can document data mining models • Can do some preprocessing (ETL) © 2010 Mark Tabladillo Ph.D. 19
  • 20. BIDS in Production • Can create a production workflow with Integration Services projects • Can create production data mining models with Analysis Services projects © 2010 Mark Tabladillo Ph.D. 20
  • 21. SSMS in Production • The standard production user interface for SQL Server • Also the standard production user interface for Analysis Services Databases • Built for • Scripting (T-SQL, MDX, DMX, XMLA) © 2010 Mark Tabladillo Ph.D. • Security • Assembly Registration (Analysis Services) • Stored Procedures (SQL Server) 21
  • 22. PowerShell in Production • Features • Object-oriented • Command window or ISE (Integrated Scripting Environment) • Accesses .NET libraries and WMI (Windows Management Instrumentation) © 2010 Mark Tabladillo Ph.D. • Version two adds event and exception handling 22
  • 23. Resources • MarkTab.NET Blog, links, video resources and information for data mining • Blog: http://marktab.net/datamining © 2010 Mark Tabladillo Ph.D. • Twitter: @MarkTabNet 23
  • 24. Regroup and Conclusion • Main Points from this Presentation © 2010 Mark Tabladillo Ph.D. 24
  • 25. Contact Information • Mark Tabladillo http://marktab.net • Also on: Twitter @marktabnet © 2010 Mark Tabladillo Ph.D. Linked In 25