SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Applied Enterprise Semantic
Mining
Mark Tabladillo Ph.D.
Data Mining Architect
SQL Saturday Atlanta April 14, 2012
About MarkTab

 20+ Years in Atlanta
   Consulting since 1998; Incorporated 2003
   Part-Time Faculty at University of Phoenix
 SAS and Microsoft Expert
   Presenter since 1998 at conferences like TechEd
   and SAS Global Forum
 http://marktab.com       @MarkTabNet
Introduction

 SQL Server 2012 has new Programmability
 Enhancements
   Statistical Semantic Search
   File Tables
   Full-Text Search Improvements
 These combined technologies make SQL
 Server 2012 a strong contender in text mining
PROBLEM STATEMENT
Challenges

  Building and Maintaining Applications with
  relational and non-relational data is hard
    Complex integration
    Duplicated functionality
    Compensation for unavailable services
  80% of all data is not stored in databases!
  Most of it is “unstructured”

(2012, Michael Rys, Microsoft)
MICROSOFT AND GOOGLE
History

 July 2008
   Microsoft purchases Powerset for US$100 Million
   Google Dismisses Semantic Search
   http://venturebeat.com/2008/06/26/microsoft-to-
   buy-semantic-search-engine-powerset-for-100m-
   plus/
   http://www.forbes.com/2008/07/01/powerset-msft-
   search-tech-intel-cx_ag_0701powerset.html
History

 March 2009
   Google announces “snippets” as relevant to
   search
   The media picks this story up as “semantic
   search”
   http://googleblog.blogspot.com/2009/03/two-new-
   improvements-to-google-
   results.html#!/2009/03/two-new-improvements-to-
   google-results.html
History

 February 2012
   Google announces Knowledge Graph, an explicit
   application of semantic search
   http://mashable.com/2012/02/13/google-
   knowledge-graph-change-search/
History

 April 2012
   Microsoft purchases 800+ patents from AOL for
   US$1 Billion
   Among the patents are semantic search and
   metadata querying – older than Google
   http://www.theregister.co.uk/2012/04/09/aol_micr
   osoft_patent_deal/
PURPOSE STATEMENT
(SQL SERVER)
Goals

  Reduce the cost of managing all data
  Simplify the development of applications over
  all data
  Provide management and programming
  services for all data
  Make SQL Server the preferred choice for
  managing Unstructured Data and allow
  building Rich Application Experience on top
(2012, Michael Rys, Microsoft)
http://msdn.microsoft.com/en-us/library/cc645577.aspx

NEW IN SQL SERVER 2012
Statistical Semantic Search

 Identifies statistically relevant key phrases
 Based on these phrases, can identify (by
 score) similar documents
FileTables

 Built on existing SQL Server FILESTREAM
 technology
 Files and documents
   Stored in special tables in SQL Server
   Accessed if they were stored in the file system
Full-Text Search Enhancements

 Property search: search on tagged properties
 (such as author or title)
 Customizable NEAR: find words or phrases
 close to one another
 New Word Breakers and Stemmers (for many
 languages)
HOW DOES SEMANTIC
SEARCH WORK?
From Documents to Output

                Office

     Varchar
                           PDF
     NVarchar
                Rowset
                Output
                 with
                Scores
“Beyond Relational” vs. “Adoption”

 Start with unstructured (meaning non-
 relational) data
 Use Windows technology
   Reading and Writing Files (Win32 API)
   iFilters for reading proprietary formats
 Develop indexed structure from unstructured
 data
(iFilter Required)


                            iFilters   Full-Text
  Documents                            Keyword
                                        Index
                                         “FTI”


                                       Semantic
                                          Key
                                        Phrase
                            Semantic
   Semantic Document                    Index –
                            Database
   Similarity Index “DSI”              Tag Index
                                          “TI”
“iFilter”?

  IFilters are components that allow search
  services to index content of specific file types,
  letting you search for content in those files.
  They are intended for use with Microsoft
  Search Services (Sharepoint, SQL,
  Exchange, Windows Search).
Microsoft Office 2010 Filters Pack

 Legacy Office Filter (97-2003; .doc, .ppt, .xls)
 Metro Office Filter (2007; .docx, .pptx, .xlsx)
 Zip Filter
 OneNote filter
 Visio Filter
 Publisher Filter
 Open Document Format Filter
Adobe PDF iFilter 9 for 64-bit platforms

 Allows PDF search
 Not currently supported for Windows 7
   But I used it anyway ☺
 Add the Bin directory to your path
   Computer (right click), Properties, Advanced
   System Settings, Environment Variables
“Semantic Language Statistics Database”?

 This database contains the statistical
 language models required by semantic
 search.
 A single semantic language statistics
 database contains the language models for
 all the languages that are supported for
 semantic indexing.
Languages Currently Supported
 Traditional Chinese
 German
 English
 French
 Italian
 Brazilian
 Russian
 Swedish
 Simplified Chinese
 British English
 Portuguese
 Chinese (Hong Kong SAR, PRC)
 Spanish
 Chinese (Singapore)
 Chinese (Macau SAR)
PERFORMANCE
Phases of Semantic Indexing

   Full Text Keyword Index
             “FTI”
                                             Semantic Document
                                             Similarity Index “DSI”
 Semantic Key Phrase Index
            –
      Tag Index “TI”




 http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Integrated Full Text Search (iFTS)

 Improved Performance and Scale:
   Scale-up to 350M documents for storage and
   search
   iFTS query performance 7-10 times faster than in
   SQL Server 2008
   Worst-case iFTS query response times less than
   3 sec for corpus
   Similar or better than main database search
   competitors
 (2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI

 First known linearly scaling end-to-end
 Search and Semantic product in the industry




 Time in Seconds vs. Number of Documents
 (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
Conclusion

 SQL Server 2012 adds new text processing
 capabilities
 This technology scales linearly
 Microsoft invites millions of documents for
 enterprise-level applications
Network

 MarkTab Consulting
   http://marktab.com
 Blog
   http://marktab.net
 Twitter
   @marktabnet
APPENDIX
References

 Video
   http://channel9.msdn.com/Shows/DataBound/DataBo
   und-Episode-2-Semantic-Search
   http://www.microsoftpdc.com/2009/SVR32
 Semantic Search (Books Online) – explains the
 demo
   http://msdn.microsoft.com/en-
   us/library/gg492075.aspx
 Paper
   http://users.cis.fiu.edu/~lzhen001/activities/KDD2011
   Program/docs/p213.pdf
Demo: My Semantic Search Sample

 http://mysemanticsearch.codeplex.com/
 Requires:
   iFilters
   Semantic Language Statistics Database
   IIS7, IIS6, with Windows Authentication
   .NET 4.0
   Silverlight 4.0
   FILESTREAM (complete)
Demo: T-SQL and Documents

 Naveen Garg
 Requires Adventure Works (from Codeplex)
 http://blogs.msdn.com/b/sqlfts/archive/2011/0
 7/21/introducing-fulltext-statistical-semantic-
 search-in-sql-server-codename-denali-
 release.aspx
Abstract
 SQL Server 2012 debuts a new Semantic Platform
 (commonly known as the applied task, Semantic
 Search). This text mining technology leverages the
 already established Full Text Index, and builds
 semantic indexes in a two-phase process. This
 presentation provides a science description and
 demo for the Enterprise implementation of Tag Index
 and Document Similarity Index. At present (RTM),
 the indexes work for 15 languages. Included are
 strategy tips for how to best leverage the technology
 along with already-existing Microsoft text mining and
 data mining.

Mais conteúdo relacionado

Mais procurados

Writing Code To Interact With Enterprise Search
Writing Code To Interact With Enterprise SearchWriting Code To Interact With Enterprise Search
Writing Code To Interact With Enterprise SearchCorey Roth
 
Blue Hill Research Review of Lexis for Microsoft Office
Blue Hill Research Review of Lexis for Microsoft OfficeBlue Hill Research Review of Lexis for Microsoft Office
Blue Hill Research Review of Lexis for Microsoft OfficeLexisNexis
 
Building Solutions With Business Connectivity Services
Building Solutions With Business Connectivity ServicesBuilding Solutions With Business Connectivity Services
Building Solutions With Business Connectivity ServicesChakkaradeep Chandran
 
Introduction To Enterprise Search Tulsa Tech Fest 2009
Introduction To Enterprise Search   Tulsa Tech Fest 2009Introduction To Enterprise Search   Tulsa Tech Fest 2009
Introduction To Enterprise Search Tulsa Tech Fest 2009Corey Roth
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTBert Johnson
 
Connecting to LOB Systems Using BCS, Ayman El-Hattab, MVP
Connecting to LOB Systems Using BCS, Ayman El-Hattab, MVPConnecting to LOB Systems Using BCS, Ayman El-Hattab, MVP
Connecting to LOB Systems Using BCS, Ayman El-Hattab, MVPAyman El-Hattab
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesHenry Ong
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
 
External Content Types WIS SPUG
External Content Types  WIS SPUGExternal Content Types  WIS SPUG
External Content Types WIS SPUGarenza
 
SharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
SharePoint Integration and the BDC - Richard Harbridge and Mark BrahmhallSharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
SharePoint Integration and the BDC - Richard Harbridge and Mark BrahmhallBoston Area SharePoint Users Group
 
Instant ECM with SharePoint 2010 - SPTechCon Boston 2011
Instant ECM with SharePoint 2010 - SPTechCon Boston 2011Instant ECM with SharePoint 2010 - SPTechCon Boston 2011
Instant ECM with SharePoint 2010 - SPTechCon Boston 2011Corey Roth
 
Information architecture search_bettertogether
Information architecture search_bettertogetherInformation architecture search_bettertogether
Information architecture search_bettertogetherAgnes Molnar
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Noterumito
 
Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...
Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...
Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...Corey Roth
 
Building Self Documenting HTTP APIs with CQRS
Building Self Documenting HTTP APIs with CQRSBuilding Self Documenting HTTP APIs with CQRS
Building Self Documenting HTTP APIs with CQRSDerek Comartin
 
SharePoint Saturday Dayton 2012
SharePoint Saturday Dayton 2012SharePoint Saturday Dayton 2012
SharePoint Saturday Dayton 2012Scott_Brickey
 
FAST Search for SharePoint 2010
FAST Search for SharePoint 2010FAST Search for SharePoint 2010
FAST Search for SharePoint 2010Alexandre Ferreira
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Three levels of pain
Three levels of painThree levels of pain
Three levels of painClaudio Reyes
 

Mais procurados (20)

Writing Code To Interact With Enterprise Search
Writing Code To Interact With Enterprise SearchWriting Code To Interact With Enterprise Search
Writing Code To Interact With Enterprise Search
 
Blue Hill Research Review of Lexis for Microsoft Office
Blue Hill Research Review of Lexis for Microsoft OfficeBlue Hill Research Review of Lexis for Microsoft Office
Blue Hill Research Review of Lexis for Microsoft Office
 
Building Solutions With Business Connectivity Services
Building Solutions With Business Connectivity ServicesBuilding Solutions With Business Connectivity Services
Building Solutions With Business Connectivity Services
 
Share point metadata
Share point metadataShare point metadata
Share point metadata
 
Introduction To Enterprise Search Tulsa Tech Fest 2009
Introduction To Enterprise Search   Tulsa Tech Fest 2009Introduction To Enterprise Search   Tulsa Tech Fest 2009
Introduction To Enterprise Search Tulsa Tech Fest 2009
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FAST
 
Connecting to LOB Systems Using BCS, Ayman El-Hattab, MVP
Connecting to LOB Systems Using BCS, Ayman El-Hattab, MVPConnecting to LOB Systems Using BCS, Ayman El-Hattab, MVP
Connecting to LOB Systems Using BCS, Ayman El-Hattab, MVP
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integration
 
External Content Types WIS SPUG
External Content Types  WIS SPUGExternal Content Types  WIS SPUG
External Content Types WIS SPUG
 
SharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
SharePoint Integration and the BDC - Richard Harbridge and Mark BrahmhallSharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
SharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
 
Instant ECM with SharePoint 2010 - SPTechCon Boston 2011
Instant ECM with SharePoint 2010 - SPTechCon Boston 2011Instant ECM with SharePoint 2010 - SPTechCon Boston 2011
Instant ECM with SharePoint 2010 - SPTechCon Boston 2011
 
Information architecture search_bettertogether
Information architecture search_bettertogetherInformation architecture search_bettertogether
Information architecture search_bettertogether
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...
Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...
Advanced BCS - Business Data Connectivity Models and Custom Connectors - SPTe...
 
Building Self Documenting HTTP APIs with CQRS
Building Self Documenting HTTP APIs with CQRSBuilding Self Documenting HTTP APIs with CQRS
Building Self Documenting HTTP APIs with CQRS
 
SharePoint Saturday Dayton 2012
SharePoint Saturday Dayton 2012SharePoint Saturday Dayton 2012
SharePoint Saturday Dayton 2012
 
FAST Search for SharePoint 2010
FAST Search for SharePoint 2010FAST Search for SharePoint 2010
FAST Search for SharePoint 2010
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Three levels of pain
Three levels of painThree levels of pain
Three levels of pain
 

Destaque

Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Mark Tabladillo
 
FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012Michael Rys
 
Sql 2012 development and programming
Sql 2012  development and programmingSql 2012  development and programming
Sql 2012 development and programmingLearnNowOnline
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerMark Tabladillo
 
SQL Server - Full text search
SQL Server - Full text searchSQL Server - Full text search
SQL Server - Full text searchPeter Gfader
 
Effective Usage of SQL Server 2005 Database Mirroring
Effective Usage of SQL Server 2005 Database MirroringEffective Usage of SQL Server 2005 Database Mirroring
Effective Usage of SQL Server 2005 Database Mirroringwebhostingguy
 
SQL Server Performance Tuning Baseline
SQL Server Performance Tuning BaselineSQL Server Performance Tuning Baseline
SQL Server Performance Tuning Baseline► Supreme Mandal ◄
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance TuningBala Subra
 
SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataMarek Maśko
 
Always on in SQL Server 2012
Always on in SQL Server 2012Always on in SQL Server 2012
Always on in SQL Server 2012Fadi Abdulwahab
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 

Destaque (14)

Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310
 
FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012
 
Sql 2012 development and programming
Sql 2012  development and programmingSql 2012  development and programming
Sql 2012 development and programming
 
Understanding indices
Understanding indicesUnderstanding indices
Understanding indices
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 
SQL Server - Full text search
SQL Server - Full text searchSQL Server - Full text search
SQL Server - Full text search
 
Effective Usage of SQL Server 2005 Database Mirroring
Effective Usage of SQL Server 2005 Database MirroringEffective Usage of SQL Server 2005 Database Mirroring
Effective Usage of SQL Server 2005 Database Mirroring
 
SQL Server Performance Tuning Baseline
SQL Server Performance Tuning BaselineSQL Server Performance Tuning Baseline
SQL Server Performance Tuning Baseline
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance Tuning
 
SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML Data
 
Always on in SQL Server 2012
Always on in SQL Server 2012Always on in SQL Server 2012
Always on in SQL Server 2012
 
File Upload
File UploadFile Upload
File Upload
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Implementing Full Text in SQL Server
Implementing Full Text in SQL ServerImplementing Full Text in SQL Server
Implementing Full Text in SQL Server
 

Semelhante a Sql Saturday 111 Atlanta applied enterprise semantic mining

Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Mark Tabladillo
 
Applied Semantic Search 201306
Applied Semantic Search 201306Applied Semantic Search 201306
Applied Semantic Search 201306Mark Tabladillo
 
Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410Mark Tabladillo
 
Gilbane 2009 -- How Can Content Management Software Keep Pace?
Gilbane 2009 -- How Can Content Management Software Keep Pace?Gilbane 2009 -- How Can Content Management Software Keep Pace?
Gilbane 2009 -- How Can Content Management Software Keep Pace?weisinger
 
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group
 
SharePoint 2010 Integration and Interoperability - SharePoint Saturday Hartford
SharePoint 2010 Integration and Interoperability - SharePoint Saturday HartfordSharePoint 2010 Integration and Interoperability - SharePoint Saturday Hartford
SharePoint 2010 Integration and Interoperability - SharePoint Saturday HartfordRichard Harbridge
 
Productie Sharepoint Presentatie
Productie Sharepoint PresentatieProductie Sharepoint Presentatie
Productie Sharepoint PresentatieJan van der Kolk
 
SharePoint 2010 Integration and Interoperability: What you need to know
SharePoint 2010 Integration and Interoperability: What you need to knowSharePoint 2010 Integration and Interoperability: What you need to know
SharePoint 2010 Integration and Interoperability: What you need to knowRichard Harbridge
 
Best Practices - SharePoint 2010: Integration and Interoperability
Best Practices - SharePoint 2010: Integration and InteroperabilityBest Practices - SharePoint 2010: Integration and Interoperability
Best Practices - SharePoint 2010: Integration and InteroperabilityRichard Harbridge
 
SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...
SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...
SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...Richard Harbridge
 
SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...
SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...
SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...Richard Harbridge
 
SharePoint 2007 and 2010 + Use Cases
SharePoint 2007 and 2010 + Use CasesSharePoint 2007 and 2010 + Use Cases
SharePoint 2007 and 2010 + Use Casesjovojovo
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010bgerman
 
Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305Mark Tabladillo
 
SharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
SharePoint Server 2007 Overview - TechMentor 2007 with Joel OlesonSharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
SharePoint Server 2007 Overview - TechMentor 2007 with Joel OlesonJoel Oleson
 
Siebel 8 Quick Hits: Search
Siebel 8 Quick Hits: SearchSiebel 8 Quick Hits: Search
Siebel 8 Quick Hits: SearchScott Nash
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureIdo Flatow
 

Semelhante a Sql Saturday 111 Atlanta applied enterprise semantic mining (20)

Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305
 
Applied Semantic Search 201306
Applied Semantic Search 201306Applied Semantic Search 201306
Applied Semantic Search 201306
 
Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410
 
Enterprise Information Integration
Enterprise Information IntegrationEnterprise Information Integration
Enterprise Information Integration
 
Fundamentals Of Search
Fundamentals Of SearchFundamentals Of Search
Fundamentals Of Search
 
Gilbane 2009 -- How Can Content Management Software Keep Pace?
Gilbane 2009 -- How Can Content Management Software Keep Pace?Gilbane 2009 -- How Can Content Management Software Keep Pace?
Gilbane 2009 -- How Can Content Management Software Keep Pace?
 
Microsoft Enterprise Seach using SharePoint
Microsoft Enterprise Seach using SharePointMicrosoft Enterprise Seach using SharePoint
Microsoft Enterprise Seach using SharePoint
 
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
 
SharePoint 2010 Integration and Interoperability - SharePoint Saturday Hartford
SharePoint 2010 Integration and Interoperability - SharePoint Saturday HartfordSharePoint 2010 Integration and Interoperability - SharePoint Saturday Hartford
SharePoint 2010 Integration and Interoperability - SharePoint Saturday Hartford
 
Productie Sharepoint Presentatie
Productie Sharepoint PresentatieProductie Sharepoint Presentatie
Productie Sharepoint Presentatie
 
SharePoint 2010 Integration and Interoperability: What you need to know
SharePoint 2010 Integration and Interoperability: What you need to knowSharePoint 2010 Integration and Interoperability: What you need to know
SharePoint 2010 Integration and Interoperability: What you need to know
 
Best Practices - SharePoint 2010: Integration and Interoperability
Best Practices - SharePoint 2010: Integration and InteroperabilityBest Practices - SharePoint 2010: Integration and Interoperability
Best Practices - SharePoint 2010: Integration and Interoperability
 
SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...
SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...
SharePoint Fest Denver - SharePoint 2010 Integration and Interoperability: Wh...
 
SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...
SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...
SharePoint Fest Chicago - SharePoint 2010 Integration and Interoperability: W...
 
SharePoint 2007 and 2010 + Use Cases
SharePoint 2007 and 2010 + Use CasesSharePoint 2007 and 2010 + Use Cases
SharePoint 2007 and 2010 + Use Cases
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010
 
Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305
 
SharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
SharePoint Server 2007 Overview - TechMentor 2007 with Joel OlesonSharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
SharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
 
Siebel 8 Quick Hits: Search
Siebel 8 Quick Hits: SearchSiebel 8 Quick Hits: Search
Siebel 8 Quick Hits: Search
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
 

Mais de Mark Tabladillo

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006Mark Tabladillo
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMark Tabladillo
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0Mark Tabladillo
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusMLMark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Mark Tabladillo
 

Mais de Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Sql Saturday 111 Atlanta applied enterprise semantic mining

  • 1. Applied Enterprise Semantic Mining Mark Tabladillo Ph.D. Data Mining Architect SQL Saturday Atlanta April 14, 2012
  • 2. About MarkTab 20+ Years in Atlanta Consulting since 1998; Incorporated 2003 Part-Time Faculty at University of Phoenix SAS and Microsoft Expert Presenter since 1998 at conferences like TechEd and SAS Global Forum http://marktab.com @MarkTabNet
  • 3. Introduction SQL Server 2012 has new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search Improvements These combined technologies make SQL Server 2012 a strong contender in text mining
  • 5. Challenges Building and Maintaining Applications with relational and non-relational data is hard Complex integration Duplicated functionality Compensation for unavailable services 80% of all data is not stored in databases! Most of it is “unstructured” (2012, Michael Rys, Microsoft)
  • 7. History July 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Search http://venturebeat.com/2008/06/26/microsoft-to- buy-semantic-search-engine-powerset-for-100m- plus/ http://www.forbes.com/2008/07/01/powerset-msft- search-tech-intel-cx_ag_0701powerset.html
  • 8. History March 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new- improvements-to-google- results.html#!/2009/03/two-new-improvements-to- google-results.html
  • 9. History February 2012 Google announces Knowledge Graph, an explicit application of semantic search http://mashable.com/2012/02/13/google- knowledge-graph-change-search/
  • 10. History April 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying – older than Google http://www.theregister.co.uk/2012/04/09/aol_micr osoft_patent_deal/
  • 12. Goals Reduce the cost of managing all data Simplify the development of applications over all data Provide management and programming services for all data Make SQL Server the preferred choice for managing Unstructured Data and allow building Rich Application Experience on top (2012, Michael Rys, Microsoft)
  • 14. Statistical Semantic Search Identifies statistically relevant key phrases Based on these phrases, can identify (by score) similar documents
  • 15. FileTables Built on existing SQL Server FILESTREAM technology Files and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  • 16. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  • 18. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  • 19. “Beyond Relational” vs. “Adoption” Start with unstructured (meaning non- relational) data Use Windows technology Reading and Writing Files (Win32 API) iFilters for reading proprietary formats Develop indexed structure from unstructured data
  • 20. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Semantic Document Index – Database Similarity Index “DSI” Tag Index “TI”
  • 21. “iFilter”? IFilters are components that allow search services to index content of specific file types, letting you search for content in those files. They are intended for use with Microsoft Search Services (Sharepoint, SQL, Exchange, Windows Search).
  • 22. Microsoft Office 2010 Filters Pack Legacy Office Filter (97-2003; .doc, .ppt, .xls) Metro Office Filter (2007; .docx, .pptx, .xlsx) Zip Filter OneNote filter Visio Filter Publisher Filter Open Document Format Filter
  • 23. Adobe PDF iFilter 9 for 64-bit platforms Allows PDF search Not currently supported for Windows 7 But I used it anyway ☺ Add the Bin directory to your path Computer (right click), Properties, Advanced System Settings, Environment Variables
  • 24. “Semantic Language Statistics Database”? This database contains the statistical language models required by semantic search. A single semantic language statistics database contains the language models for all the languages that are supported for semantic indexing.
  • 25. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  • 27. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 28. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 29. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 30. Conclusion SQL Server 2012 adds new text processing capabilities This technology scales linearly Microsoft invites millions of documents for enterprise-level applications
  • 31. Network MarkTab Consulting http://marktab.com Blog http://marktab.net Twitter @marktabnet
  • 33. References Video http://channel9.msdn.com/Shows/DataBound/DataBo und-Episode-2-Semantic-Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en- us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011 Program/docs/p213.pdf
  • 34. Demo: My Semantic Search Sample http://mysemanticsearch.codeplex.com/ Requires: iFilters Semantic Language Statistics Database IIS7, IIS6, with Windows Authentication .NET 4.0 Silverlight 4.0 FILESTREAM (complete)
  • 35. Demo: T-SQL and Documents Naveen Garg Requires Adventure Works (from Codeplex) http://blogs.msdn.com/b/sqlfts/archive/2011/0 7/21/introducing-fulltext-statistical-semantic- search-in-sql-server-codename-denali- release.aspx
  • 36. Abstract SQL Server 2012 debuts a new Semantic Platform (commonly known as the applied task, Semantic Search). This text mining technology leverages the already established Full Text Index, and builds semantic indexes in a two-phase process. This presentation provides a science description and demo for the Enterprise implementation of Tag Index and Document Similarity Index. At present (RTM), the indexes work for 15 languages. Included are strategy tips for how to best leverage the technology along with already-existing Microsoft text mining and data mining.