Blog Spammers SUCK ! - Or, Blocking Spam in .TEXT
Recently I’ve been getting hammered with blog spam on www.JoeOn.net. My blog mirror blogs.msdn.com/joestagner runs on Community Server which does a better job of spam blocking but I don’t want to switch www.JoeOn.net to Community Server because it is a Commercial product (Sorry Rob) and I want to ability to code additions to meet my needs and make my code available to anyone that wants to use it. (Don’t get me wrong, community server ROCKS and I will be using it in a new On-Line community.)
I also really like DasBlog but stumbled when trying to migrate my blog content from .TEXT to DasBlog.
So I needed to find a way to solve the spam problem with my existing .TEXT setup, at least for the mean time.
A little research revealed this post by John Sample -THANKS John.
Original Post https://www.johnsample.com/articles/DotTextSpamBlock.aspx
Basically, it ads a table to the .TEXT SQL Server instance that is used to contain a list of “spam words”. If the post contains the spam word it is disallowed.
Where is how I’m using mine.
I’m primarily adding portions of the domain names that the spammers, who are mostly on line gambeling sites, are using. They are mostly free sites so this won’t be a problem with any valid perminant domains being referenced in a post.
Here is the SQL that creates the table.
IF NOT EXISTS (select * from dbo.sysobjects where id = object_id(N'[dbo].[FILTER_WORD]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
BEGIN
CREATE TABLE [FILTER_WORD] (
[fw_ID] [int] IDENTITY (1, 1) NOT NULL ,
[fw_WORD] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FILTER_WORD] PRIMARY KEY CLUSTERED
(
[fw_ID]
) ON [PRIMARY]
) ON [PRIMARY]
END
GO
Once the table is installed I just used Microsoft Query Analyzer to add the words to the FILTER_WORD table.
Next we I needed to alter the stored procedure – [blog_InsertEntry] as follows. [Download Here]
ALTER Proc blog_InsertEntry
(
@Title nvarchar(255),
@TitleUrl nvarchar(255),
@Text ntext,
@SourceUrl nvarchar(200),
@PostType int,
@Author nvarchar(50),
@Email nvarchar(50),
@SourceName nvarchar(200),
@Description nvarchar(500),
@BlogID int,
@DateAdded datetime,
@ParentID int,
@PostConfig int,
@EntryName nvarchar(150),
@ID int output)
as
If(@EntryName is not null)
Begin
if exists(Select EntryName From blog_Content where BlogID = @BlogID and EntryName = @EntryName)
Begin
RAISERROR('The EntryName you entry is already in use with in this Blog. Please pick a unique EntryName.',11,1)
RETURN 1
End
End
if(Ltrim(Rtrim(@Description)) = '')
Begin
set @Description = null
End
If (@PostType = 3 or @PostType = 4)
Begin
declare @temp_text varchar(8000)
set @temp_text = UPPER(isnull(CAST(@Text as varchar(7500)),'')) + UPPER(isnull(@SourceName,'')) + Upper(isnull(@TitleUrl,'')) + Upper(isnull(@Author,''))
declare @word_buffer varchar(100)
declare filterCursor CURSOR LOCAL FAST_FORWARD for
select UPPER(fw_WORD) from FILTER_WORD
OPEN filterCursor
FETCH NEXT FROM filterCursor INTO @word_buffer
WHILE @@FETCH_STATUS = 0
BEGIN
IF (CHARINDEX(@word_buffer,@temp_text) > 0)
Begin
CLOSE filterCursor
DEALLOCATE filterCursor
RAISERROR('Disallowed words or phrases detected. I you feel this message is in error, please contact blog owner.',11,1)
RETURN 1
End
FETCH NEXT FROM filterCursor INTO @word_buffer
END
CLOSE filterCursor
DEALLOCATE filterCursor
End
INSERT INTO blog_Content
(Title, TitleUrl, [Text], SourceUrl, PostType, Author, Email, DateAdded,DateUpdated, SourceName, [Description], PostConfig, ParentID, BlogID, EntryName )
VALUES
(@Title, @TitleUrl, @Text, @SourceUrl, @PostType, @Author, @Email, @DateAdded, @DateAdded, @SourceName, @Description, @PostConfig, @ParentID, @BlogID, @EntryName)
Select @ID = @@Identity
if(@PostType = 1 or @PostType = 2)
Begin
exec blog_UpdateConfigUpdateTime @blogID, @DateAdded
End
Else if(@PostType = 3)
Begin
Update blog_Content
Set FeedBackCount = FeedBackCount + 1 where [ID] = @ParentID
End
I also added the TrackBack filter. In addition to blocking based on words this update lets you add IP address to the filter file. To do this we need to modify the stored procedure [blog_InsertPingTrackEntry] as below. [Download Here]
ALTER Proc blog_InsertPingTrackEntry
(
@Title nvarchar(255),
@TitleUrl nvarchar(255),
@Text ntext,
@SourceUrl nvarchar(200),
@PostType int,
@Author nvarchar(50),
@Email nvarchar(50),
@SourceName nvarchar(200),
@Description nvarchar(500),
@BlogID int,
@DateAdded datetime,
@ParentID int,
@PostConfig int,
@EntryName nvarchar(150),
@ID int output)
as
Set @ID = -1
if not exists (Select [ID] From blog_Content where TitleUrl = @TitleUrl and ParentID = @ParentID)
Begin
if(Ltrim(Rtrim(@Description)) = '')
set @Description = null
declare @temp_text varchar(8000)
set @temp_text = UPPER(isnull(CAST(@Text as varchar(7500)),'')) + UPPER(isnull(@SourceName,'')) + Upper(isnull(@TitleUrl,'')) + Upper(isnull(@Author,'')) + Upper(isnull(@Title,''))
declare @word_buffer varchar(100)
declare filterCursor CURSOR LOCAL FAST_FORWARD for
select UPPER(fw_WORD) from FILTER_WORD
OPEN filterCursor
FETCH NEXT FROM filterCursor INTO @word_buffer
WHILE @@FETCH_STATUS = 0
BEGIN
IF (CHARINDEX(@word_buffer,@temp_text) > 0)
Begin
CLOSE filterCursor
DEALLOCATE filterCursor
--RAISERROR('Disallowed words or phrases detected. I you feel this message is in error, please contact blog owner.',11,1)
RETURN 1
End
FETCH NEXT FROM filterCursor INTO @word_buffer
END
CLOSE filterCursor
DEALLOCATE filterCursor
INSERT INTO blog_Content
( PostConfig, Title, TitleUrl, [Text], SourceUrl, PostType, Author, Email, DateAdded,DateUpdated, SourceName, [Description], ParentID, BlogID)
VALUES
(@PostConfig, @Title, @TitleUrl, @Text, @SourceUrl, @PostType, @Author, @Email, @DateAdded, @DateAdded, @SourceName, @Description, @ParentID, @BlogID)
Select @ID = @@Identity
Update blog_Content
Set FeedBackCount = FeedBackCount + 1
where [ID] = @ParentID
End
— SO far it looks to have reduced my blog spam by 95%
Comments
- Anonymous
February 19, 2006
Hi Joe,
I had the same problem a while back so I wrote my own solution that works as a HTTPModule. If you're interested you can check it out at: http://briandela.com/blog/archive/2005/06/29/652.aspx
Brian - Anonymous
March 27, 2006
The comment has been removed - Anonymous
May 29, 2009
The comment has been removed