2. Peter Gfader Specializes in C# and .NET (Java not anymore) TestingAutomated tests Agile, ScrumCertified Scrum Trainer Technology aficionado Silverlight ASP.NET Windows Forms
5. What we did last weekCLR Integration .NET .NET FX CLR
6. What we did last weekCLR Integration Stored Proc Functions Triggers Bottom Line Use T-SQL for all data operations Use CLR assemblies for any complex calculations and transformations
7. Homework? Find all products that have a productnumber starting with BK Find all products with "Road" in the name that are Silver Find a list of products that have no review Find the list price ([listprice]) of all products in our shop What is the sum of the list price of all our products Find the product with the maximum and minimum listprice Find a list of products with their discount sale (hint see Sales.SalesOrderDetail) Find the sum of pricesof the products in each subcategory
8. Session 5SQL Server Full-Text Searchusing Full-Text search in SQL Server 2008
9. Agenda What is Full text search The old way 2005 The new way 2008 How to Querying
10. What is Fulltext search SELECT *FROM [Northwind].[dbo].[Employees]WHERE Notes LIKE '%grad%‘
11. What is REAL Fulltext search Allows searching for text/words in columns Similar words Plural of words Based on special index Full-text index (Full text catalog) SELECT *FROM [Northwind].[dbo].[Employees]WHEREFREETEXT(*,'grad‘)
15. Full-Text Search Terminology 3/3 Stopwords/Stoplists not relevant word to search e.g. ‘and’, ‘a’, ‘is’ and ‘the’ in English Accent insensitivity cafè = cafe
22. Administering Full-Text Search Full-text administration can be separated into three main tasks: Creating/altering/dropping full-text catalogs Creating/altering/dropping full-text indexes Scheduling and maintaining index population.
25. Administering Full-Text Search Automatic update of index Slows down database performance Manually repopulate full text index Time consuming Asynchronous process in the background Periods of low activity Index not up to date
26.
27.
28. Creating a Full-Text Catalog (SQL 2005) Syntax CREATE FULLTEXT CATALOG catalog_name [ON FILEGROUP filegroup] [IN PATH 'rootpath'] [WITH <catalog_option>] [AS DEFAULT] [AUTHORIZATION owner_name ] <catalog_option>::= ACCENT_SENSITIVITY = {ON|OFF} Example USE AdventureWorks_FulllText CREATE FULLTEXT CATALOG AdventureWorks_FullTextCatalog ON FILEGROUP FullTextCatalog_FGWITH ACCENT_SENSITIVITY = ON AS DEFAULTAUTHORIZATION dbo
29. Creating a Full-Text CatalogStep by step Create a directory on the operating system named C:est Launch SSMS, connect to your instance, and open a new query window Add a new filegroup to the AdventureWorks_FulllText USE Master GO ALTER DATABASE AdventureWorks_FulllText GO ALTER DATABASE AdventureWorks_FulllText ADD FILE (NAME = N’ AdventureWorks_FulllText _data’, FILENAME=N’C:ESTAdventureWorks_FulllText _data.ndf’, SIZE=2048KB, FILEGROTH=1024KB ) TO FILEGROUP [FTFG1] GO Create a full-text catalog on the FTFG1 filegroup by executing the following command:USE AdventureWorks_FulllTextGOCREATE FULLTEXT CATALOG AWCatalog on FILEGROUP FTFG1 IN PATH ‘C:EST’ AS DEFAULT;GO
45. Summary TSQL command CREATE FULLTEXT INDEX Full-text indexes on Text-based Binary Image columns VARBINARY / IMAGE Store files in their native format within SQL Server Full-text indexing and searching Lots of helper services/functionality Word-breaker routines, language files, noise word files, filters and protocol handlers.
47. Because of the external structure for storing full-text indexes, changes to underlying data columns are not immediately reflected in the full-text index. Instead, a background process enlists the word breakers, filters and noise word filters to build the tokens for each column, which are then merged back into the main index either automatically or manually. This update process is called population or a crawl. To keep your full-text indexes up to date, you must periodically populate them. Populating a Full-Text Index
48. You can choose from there modes for full-text population: Full Incremental Update Populating a Full-Text Index
49. Populating a Full-Text Index Full Read and process all rows Very resource-intensive Incremental Automatically populates the index for rows that were modified since the last population Requires timestamp column Update Uses changes tracking from SQL Server (inserts, updates, and deletes) Specify how you want to propagate the changes to the index AUTO automatic processing MANUAL implement a manual method for processing changes
50. Populating a Full-Text Index Example ALTER FULLTEXT INDEX ON Production.ProductDescription START FULL POPULATION; ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION;
51.
52. Populating a Full-Text Catalog Example USE AdventureWorks_FulllText; ALTER FULLTEXT CATALOG AdventureWorks_FullTextCatalogREBUILD WITH ACCENT_SENSITIVITY=OFF; -- Check Accentsensitivity SELECT FULLTEXTCATALOGPROPERTY('AdventureWorks_FullTextCatalog', 'accentsensitivity');
53.
54.
55. Managing Population Schedules In SQL 2000, full text catalogs could only be populated on specified schedules SQL 2005/2008 can track database changes and keep the catalog up to date, with a minor performance hit
56. How toQuerying SQL Server Using Full-Text Search Querying SQL Server Using Full-Text Search Full-Text query keywords FREETEXT FREETEXTTABLE CONTAINS CONTAINSTABLE
57. FREETEXT Fuzzy search (less precise ) Inflectional forms (Stemming) Related words (Thesaurus)
58. FREETEXT Fuzzy search (less precise ) Inflectional forms (Stemming) Related words (Thesaurus) SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE [Description] LIKE N'%bike%'; SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE FREETEXT(Description, N’bike’);
59. FREETEXTTABLE + rank column Value between 1 and 1,000 Relative number, how well the row matches the search criteria SELECT PD.ProductDescriptionID, PD.Description, KEYTBL.[KEY], KEYTBL.RANK FROM Production.ProductDescriptionAS PD INNER JOIN FREETEXTTABLE(Production.ProductDescription, Description, N’bike’) AS KEYTBL ON PD.ProductDescriptionID = KEYTBL.[KEY]
60.
61. SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, N' FORMSOF (INFLECTIONAL, ride) '); SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, N' FORMSOF (THESAURUS, ride) '); Word proximity NEAR ( ~ ) How near words are in the text/document SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, N'mountainNEAR bike'); SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, N'mountain~ bike'); SELECT ProductDescriptionID, Description FROM Production.ProductDescriptionWHERE CONTAINS(Description, 'ISABOUT (mountain weight(.8), bikes weight (.2) )');
64. Full-text search much more powerful than LIKE More specific, relevant results Better performance LIKE for small amounts of text Full-text search scales to huge documents Provides ranking of results Common uses Search through the content in a text-intensive, database driven website, e.g. a knowledge base Search the contents of documents stored in BLOB fields Perform advanced searches e.g. with exact phrases - "to be or not to be" (however needs care!) e.g. Boolean operators - AND, OR, NOT, NEAR
65. Integrated backup, restore and recovery Faster queries and index building Data definition language (DDL) statements for creating and altering indexes System stored procedures deprecated Noise Insensitivity – noise words no longer break the query Accent Insensitivity (optional) – e.g. café and cafe are the same Multiple columns can be included in full-text searches Pre-computed ranking optimizations when using FREETEXTTABLE Improved ranking algorithm Catalogs can be set to populate continuously track changes, or index when the CPU is idle
66. Writing FTS terms The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function Several different types of terms: Simple terms Prefix terms Generation terms Proximity terms Weighted terms
67. Simple terms Either words or phrases Quotes are optional, but recommended Matches columns which contain the exact words or phrases specified Case insensitive Punctuation is ignored e.g. CONTAINS(Column, 'SQL') CONTAINS(Column, ' "SQL" ') CONTAINS(Column, 'Microsoft SQL Server') CONTAINS(Column, ' "Microsoft SQL Server" ')
68. Prefix terms Matches words beginning with the specified text e.g. CONTAINS(Column, ' "local*" ') matches local, locally, locality CONTAINS(Column, ' "local wine*" ') matches "local winery", "locally wined"
69. Generation terms Inflectional FORMSOF(INFLECTIONAL, "expression") "drive“ "drove", "driven", .. (share the same stem) When vague words such as "best" are used, doesn't match the exact word, only "good" Thesaurus FORMSOF(THESAURUS, "expression") "metal“ "gold", "aluminium"," steel", .. Both return variants of the specified word, but variants are determined differently
70. Thesaurus Supposed to match synonyms of search terms – but the thesaurus seems to be very limited Does not match plurals Not particularly useful http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506231
71. Proximity terms SyntaxCONTAINS(Column, 'local NEAR winery')CONTAINS(Column, ' "local" NEAR "winery" ') Important for ranking Both words must be in the column, like AND Terms on either side of NEAR must be either simple or proximity terms
72. Weighted terms Each word can be given a rank Can be combined with simple, prefix, generation and proximity terms e.g. CONTAINS(Column, 'ISABOUT( performance weight(.8), comfortable weight(.4))') CONTAINS(Column, 'ISABOUT( FORMSOF(INFLECTIONAL, "performance") weight (.8), FORMSOF(INFLECTIONAL, "comfortable") weight (.4))')
74. Disadvantages Full text catalogs Disk space Up-to-date Continuous updating performance hit Queries Complicated to generate Generated as a string Generated on the client
75. Advantages Backing up full text catalogs SQL 2005 Included in SQL backups by default Retained on detach and re-attach Option in detach dialog to include keep the full text catalog In SQL2008 you don’t have to worry about this
76. Advantages Much more powerful than LIKE Specific Ranking Performance Pre-computed ranking (FREETEXTTABLE) Configurable Population Schedule Continuously track changes, or index when the CPU is idle
77. Quick tips - Podcasts Pluralcast - SQL Server Under the Covers http://shrinkster.com/1ff4 Dotnetrocks - Search for SQL Server http://www.dotnetrocks.com/archives.aspx RunAsRadio - Search for SQL Server http://www.runasradio.com/archives.aspx
78. Session 5 Lab Full text search Download from Course Materials Site (to copy/paste scripts) or type manually: http://sharepoint.ssw.com.au/Training/UTSSQL/