T-SQL: Split String with a Twist
Introduction
String splitting is one of the most common problems in the Transact-SQL and programming in general. There are many blog posts discussing splitting a comma-delimited string in SQL. My favorites could be found at T-SQL Useful Links / String Split. Yet recent MSDN Transact-SQL forum post SQL-Split String brought an interesting twist to this problem and thus I want to discuss it in this article.
Problem Definition
The topic starter was kind enough to provide a table's DDL along with some data.
CREATE TABLE #ParseString ( id INT, string VARCHAR(128) )
INSERT #ParseString VALUES (1,'A6782839 A1768983A3A6A362727323AD6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366')
INSERT #ParseString VALUES (2,'A6783834A2748983A3A4A362723313AD6A64302717-D354-48D4-9E20-ADC11EF59A1332222')
as well as the desired output
SELECT id,string,6782839 AS Storeid,1768983 AS ProductID,3 AS ProductTypeLookup,6 AS LineOfBusiness, 362727323 AS ManagerKey, 'D6A9E8BD-0D15-405C-AF10-9884BDE572E3A6133366' Tuid
FROM #ParseString
WHERE id=1
UNION
SELECT id,string,6783834 AS Storeid,2748983 AS ProductID,3 AS ProductTypeLookup,4 AS LineOfBusiness, 362723313 AS ManagerKey, 'D6A64302717-D354-48D4-9E20-ADC11EF59A1332222' Tuid
FROM #ParseString
WHERE id=2
The string needed to be split using 'A' as a splitting symbol into 6 parts. Last part also included A as part of the string.
Solution
The first idea that came to mind was to use any of the splitting functions available to split using 'A' character as a splitting character. However, I decided against that idea because of the last part where I would have to concatenate the rest of the string. Therefore I decided to use recursive common table expression to do the splitting that would include both the word and the rest of the string. After that I used CASE based pivot idea to get the desired result:
;
WITH cte
AS (
SELECT id
,string
,substring(string, 2, charindex('A', substring(string, 2, len(string) - 1)) - 1) AS Word
,substring(string, charindex('A', substring(string, 2, len(string) - 1)) + 2, len(string)) AS Rest
,0 AS [Level]
FROM #ParseString
UNION ALL
SELECT id
,string
,substring(Rest, 1, charindex('A', rest) - 1) AS word
,substring(Rest, charindex('A', rest) + 1, len(rest)) AS Rest
,[Level] + 1
FROM cte
WHERE Rest LIKE '%A%'
)
SELECT id
,string
,max(CASE
WHEN [Level] = 0
THEN Word
END) AS StoreID
,max(CASE
WHEN [Level] = 1
THEN Word
END) AS ProductID
,max(CASE
WHEN [Level] = 2
THEN Word
END) AS ProductTypeLookup
,max(CASE
WHEN [Level] = 3
THEN Word
END) AS LineOfBusiness
,max(CASE
WHEN [Level] = 4
THEN Word
END) AS ManagerKey
,max(CASE
WHEN [Level] = 4
THEN Rest
END) AS tUID
FROM cte
GROUP BY id
,String
which produces our desired result:
In this solution we are using CHARINDEX and SUBSTRING functions continuously to get each word and the rest of the string. That rest of the string is used for the tUID column.
Conclusion
As we saw in that problem, sometimes it makes sense to apply recursive solution directly rather than using the existing splitting solution with concatenation at the end. It will be interesting to hear from the topic starter which of the suggested solutions performed better.
See Also
This article participated in the TechNet Guru Contributions for March, 2014 and won Bronze medal.