Database Programming: Omnibuzz Steals A Saturday Night..
I first encountered Omnibuzz during the recent prime number mania. His contribution to that exercise introduced me to his blog, which I must confess that I initially perused only lightly.
Tonight, a chain of links led me back to his blog and I explored a little more. Not only was I humbled and flattered to find this little corner on his blogroll, but he's posting a wonderful series of SQL puzzles that he calls "Scenarios to Ponder."
He's up to number 8 right now, which looks well in hand, but number 7 caught my eye.. about TWO HOURS AGO. Click on over to Omni's blog and read it; I'll be here when you get back..
_______________________
Okay, so you've read Omni's decryption requirement, which he calls "insane". Well, most of them start out that way, don't they? At any rate, if you've read Omni's post (you really should; it's a pretty cool problem), here's my totally set-based solution. You'll notice that once I get all of the housekeeping done, the SELECT itself ends up pretty simple and should be pretty performant, with the proper indexing. Despite that, I've commented it extensively..
-- Ward's solution to Omni's "A Scenario To Ponder #7"
-- (https://omnibuzz-sql.blogspot.com/2006/10/scenario-to-ponder-7.html)
-- HOUSEKEEPING STARTS HERE
-- first build a number table with the odd numbers from 1 to 100
-- put an identity column on this table, which we will use to calculate position.
-- this is a bit of a cheat, but I couldn't figure out or find a mathematical
-- formula to do this (there certainly must be one), so I decided to let
-- SQL Server do the work for me.
-- if this was SQL Server 2005, we could use a CTE with the RANK() function
-- and we wouldn't need this table at all
DROP TABLE SetBuilderOdd
GO
-- these column names are slightly counterintuitive, but the JOINs will look
-- prettier (and therefore make more sense) if we name them this way
CREATE TABLE SetBuilderOdd (
Position INT IDENTITY (1,1),
Id INT
)
-- populate the table
-- I'm going to populate it from another, larger SetBuilder table I've got
-- you might well chose another way
INSERT SetBuilderOdd (Id)
SELECT Id
FROM SetBuilder
WHERE Id % 2 = 1
AND Id <= 99
-- drop Omni's worktable
DROP TABLE BAD_ENCRYPTION
GO
-- create Omni's worktable with two new calculated columns
-- (Omni said we could change the schema of this table if we didn't break encryption;
-- the addition of the calculated columns with an index will improve performance at 50K rows
-- if this was SQL Server 2005, we'd add the PERSISTED keyword to the DDL for both calculated columns)
CREATE TABLE BAD_ENCRYPTION(
VAL_ID INT,
[VALUE] VARCHAR(100),
FirstCharValue AS LEFT([VALUE],1),
ValueLength AS LEN([VALUE])
)
GO
-- populate the worktable with the encrypted values Omni specified
INSERT INTO BAD_ENCRYPTION VALUES(1,'OAMBNRITBAUQZYZZ')
INSERT INTO BAD_ENCRYPTION VALUES(2,'SAQWLR')
INSERT INTO BAD_ENCRYPTION VALUES(3,'GZAXRCBVABGNEM')
INSERT INTO BAD_ENCRYPTION VALUES(4,'CAOWLELYEHCGTHOLR@')
-- HOUSEKEEPING ENDS HERE
GO
-- declare a memory variable to hold the decrypted value to test..
DECLARE @TestString varchar(50)
-- select a value to decrypted test
SET @TestString = 'OMNIBUZZ'
--SET @TestString = 'SQL'
--SET @TestString = 'GARBAGE'
--SET @TestString = 'COLLECTOR'
-- return the Val_Id of the matching encrypted record (if any)
SELECT be.Val_Id
FROM dbo.Bad_Encryption be
JOIN SetBuilderOdd sb
-- this criterion will provide a performance boost at 50K records
ON be.FirstCharValue = LEFT(@TestString,1)
-- this criterion matches the appropriate encrypted character in [Value]
-- to the appropriate character in @TestString
AND SUBSTRING(be.[Value],sb.Id,1) = SUBSTRING(@TestString,sb.Position,1)
-- this criterion ensures that the encrypted string is twice the length of
-- the unencrypted string (guards against false matches for substrings)
AND be.ValueLength = LEN(@TestString) * 2
-- this criterion limits the scope of the query to the number of characters
-- passed in @TestString
AND sb.Position <= LEN(@TestString)
GROUP BY be.[Val_Id]
-- this criterion is a failsafe and can likely be omitted
HAVING COUNT([Val_Id]) = LEN(@TestString)
Thanks for a fun problem, Omni! Please let me know what you think of this solution. I'll check out more of your blog when the sun is out, but right now I'm going to bed!
-wp
UPDATE: 13 Nov 2006 for grammar in several of the comments in the code.
Comments
Anonymous
January 01, 2003
Omnibuzz saw my comment (I left another one over the weekend and it showed up right away; I wonder ifAnonymous
January 01, 2003
This post will conclude (for me and for now, at least) a discussion begun here and continued here . WhenAnonymous
November 18, 2006
Thank you for your kind words WP. You've made my day :) Coming to the solution, I liked the filter using the first character. Neat strategy. Coming to the computed columns, if we are not persisting it, then we can as well compute it in the query than changing the schema. And regarding the filter, SUBSTRING(be.[Value],sb.Id,1) = SUBSTRING(@TestString,sb.Position,1) Can I just change it to SUBSTRING(be.[Value],sb.Position*2-1,1) = SUBSTRING(@TestString,sb.Position,1) Or will it be a performance issue? These are just my thoughts. Correct me if I am wrong. I didn't really try it with large dataset. Will try it once I get access to the server. Thanks again for your time. -OmniAnonymous
November 20, 2006
I was wondering if you had a chance to think about the decryption mechanism? A query that will get me the decrypted data (without using cursors) I couldn't come up with a good solution for it, though.