Sdílet prostřednictvím


Lesson Learned #3: Modifying the fulltext stopword list in Azure SQL Database.

Hello everyone

This morning I was working in a new very interesting situation is using fulltext, as you, probably know, Azure SQL Database V12 is possible to use fulltext with partial implementation, until today.

In this situation, we are not able to obtain any row running the following query SELECT [Key] as OrgId, [Rank] as Score FROM CONTAINSTABLE(Company, Name, ‘but*’)  even when there are some rows that contain this data.

To prevent a full-text index from becoming bloated, Azure SQL DB has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords , one for these words is "but"

 

  • Lets go to create the demo.

CREATE TABLE [dbo].[Company](      

[Key] [int] IDENTITY(1,1) NOT NULL,      

[Rank] [int] NOT NULL,      

[Name] [varchar](max) NULL,CONSTRAINT [PK_Company]

PRIMARY KEY CLUSTERED(       [Key] ASC)

WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

 

  • Now we insert some rows

INSERT INTO Company (rank, name) values(2,'Just')

INSERT INTO Company (rank, name) values(2,'Justa')

INSERT INTO Company (rank, name) values(2,'Microsoft')

INSERT INTO Company (rank, name) values(2,'but')

 

  • Create the fulltext search and index.

CREATE FULLTEXT CATALOG [FullTextCatalog]WITH ACCENT_SENSITIVITY = OFF

CREATE FULLTEXT INDEX ON [Company](Name) KEY INDEX [PK_Company] ON FullTextCatalog

ALTER FULLTEXT INDEX ON [Company] ENABLE

 

In fact, when I executed SELECT * FROM sys.dm_fts_parser (' "but" ', 1033, 0, 0); the word “but” is considered as Noisy word.

 

image

 

We need to modify the behavior in order to consider the word “but” as word to use as not a stopword, for this reason we need to create a new stop list.

 

  • First create a fulltext list from the system list. This below command will create a new stop list with all stop words in different languages.

CREATE FULLTEXT STOPLIST myStoplistAvoidBut FROM SYSTEM STOPLIST;

  • After it, delete the word “but” from the local language  that I’m working on, in this case the language 1033.

ALTER FULLTEXT STOPLIST myStoplistAvoidBut drop 'but' LANGUAGE 1033;

alter fulltext INDEX ON [Company] SET STOPLIST= myStoplistAvoidBut

  • Just now, there is needed to rebuild the fulltext index to run again the work breaker process.

ALTER FULLTEXT CATALOG [FullTextCatalog] REBUILD

 

To compare the results, we will execute again, the command:

  • First run the TSQL to obtain the stoplist code running select * from sys.fulltext_stoplists once we have this code, basically we need to change the number of the third parameter of this next command SELECT * FROM sys.dm_fts_parser ('"but"', 1033, 7, 0) in this case the value code is 7.

image

  • Finally, running the SELECT [Key] as OrgId, [Rank] as Score FROM CONTAINSTABLE(Company, Name, ‘but*’)   we are able to obtain the rows with this word.

Capture_1