Node Traversal in a Tree
The idea of writing this article came from MSDN Transact SQL Forum's thread count total paired members of the tree by sqlquery. Later on the same user also posted another thread which is similar to this one under the same forum as find total money of parent node of tree according to a particular pattern.
Problem Description
The Problem has not been properly stated in the above thread. So for making it readable and understanding purposes I will recreate the statement.
Suppose we have a table which contains information of the nodes of a tree (can be any tree, no specifications). Table contains the member node, parent node and the information of the member being a left or right child. From this tree we have to find the nodes and their children which have two children. So using a personification for the problem statement. It could state that if you are a member then your father and you would come in result as parent and child; only when you have two children (both left and right)
Table:
IF OBJECT_ID('Tempdb..#Tree', N'U') IS NOT NULL DROP TABLE TEMPDB..#Tree
/*Table having member node, parent node and the location of child either left or right*/
CREATE TABLE
#Tree(
``Member
INT,
``Parent
INT,
``LeftN
INT,
``RightN
INT
``)
/*Inserting dummy values into #Tree*/
INSERT
#Tree(
Member,
Parent,
LeftN,
RightN
)
SELECT
1, 0, 0, 0
UNION
SELECT
2, 1, 1, 0
UNION
SELECT
3, 1, 0, 1
UNION
SELECT
4, 2, 1, 0
UNION
SELECT
5, 2, 0, 1
UNION
SELECT
6, 3, 1, 0
UNION
SELECT
7, 3, 0, 1
UNION
SELECT
8, 4, 1, 0
UNION
SELECT
9, 4, 0, 1
UNION
SELECT
10, 5, 1, 0
UNION
SELECT
11, 5, 0, 1
UNION
SELECT
12, 7, 0, 1
UNION
SELECT
13, 7, 1, 0
UNION
SELECT
14, 8, 1, 0
UNION
SELECT
15, 8, 0, 1
UNION
SELECT
16, 14, 1, 0
UNION
SELECT
17, 14, 0, 1
UNION
SELECT
18, 17, 1, 0
UNION
SELECT
19, 17, 0, 1
So now when we try to visualize the tree we get the following structure.
http://gallery.technet.microsoft.com/site/view/file/97647/1/Tree%20and%20Table.JPG
Now, if you focus on any node, let's say 1 (root node), it has children 2, 3, 4, 5, 7, 8, 14, 17 which have both children. Similiarly if we see 4, it has 8, 14, 17 Thus the requirement comes to see the results as
http://gallery.technet.microsoft.com/site/view/file/97648/1/Result.JPG
Solution
The first thought which came to my mind is to use binary tree, so I asked the user if he could tell me the number of nodes or kind of the tree he is working in. As expected, no kind. There was no specification available. It can be any kind of tree and can be as many nodes (finite). So I thought of using dynamic SQL. Creating some variables, passing some parameters and so on. But then I decided to avoid dynamic SQL as much as possible. I read somewhere that it affects performance. Second thought which came to me; to traverse node by node and checking for possible solutions. Now this can be done either by a CTE or a while loop.
Using While
In this approach instead of just giving the result as the count of nodes, I am storing all the nodes with their respective children. So I created a table, which contains the parent node and their immediate sons. The final outcome can be seen in the image attached above.
Code below:
IF OBJECT_ID ('TEMPDB..#Interim', N'U') IS NOT NULL DROP TABLE #Interim
/*Table having member node and immediate son(s)*/
CREATE TABLE
#Interim(
``Node
INT,
``Son
INT
``)
/*Code for finding immediate son(s)*/
INSERT
#Interim(
Node,
Son
)
SELECT
Node, Member
AS
Son
FROM
(
SELECT
Parent
AS
Node
FROM
#Tree
WHERE
Parent<>0
GROUP BY
Parent
HAVING
SUM
(LeftN)=
SUM
(RightN)) Interim
JOIN
#Tree T
ON
Interim.Node=T.Parent
/*Inner table "Interim" contain nodes which have both left and right child*/
Summing up so far, we have a table named #Tree which contains the information about Nodes,Children,& Parents. Then we have a table name #Interim which contains the information about the nodes and their immediate son(s).
Now powering through....
DECLARE
@Run
INT
/*Running the loop for <(maximum number of node)/2> times, since we don't know the level of depth and it can be any kind of tree*/
SET
@Run=(
SELECT ``MAX``(Node)
FROM
#Interim)/2
WHILE
@Run>0
BEGIN
INSERT
#Interim
SELECT
LeftI.Node,RightI.Son
FROM
#Interim LeftI
JOIN
#Interim RightI
ON
LeftI.Son=RightI.Node
WHERE
RightI.Son
IN
(
SELECT
Parent
AS
Node
FROM
#Tree
WHERE
Parent<>0
GROUP BY
Parent
HAVING
SUM``(LeftN)=
SUM
(RightN))
UNION
SELECT
Node``,`` Member
AS
`` Son
FROM
(
SELECT
Parent
AS
Node
FROM
#Tree
WHERE
Parent<>0
GROUP BY
Parent
HAVING
SUM
(LeftN)=
SUM
(RightN))
Interim
JOIN
#Tree T
ON
Interim.Node=T.Parent
WHERE
Member
IN``(``SELECT
Parent
AS
Node
FROM
#Tree
WHERE
Parent<>0
GROUP BY
Parent
HAVING `` SUM
(LeftN)=
SUM
(RightN))
SET
@Run=@Run-1
END
The first selection (before UNION), is used to select the nodes at depth=2 in regard to the parent node respectively. Whereas the second select statement is used to find the nodes at depth=1 having both children.
Now our Interim Table have all the values, but as we might have run the loop for more than required number of times. Thus it will have duplicates too. So for selecting the distinct required values.
SELECT DISTINCT
Node,Son
AS
SonsHavingBothChild
FROM
#Interim
WHERE
Son
IN
(
SELECT
Parent
AS
Node
FROM
#Tree
WHERE
Parent<>0
GROUP BY
Parent
HAVING
SUM
(LeftN)=
SUM
(RightN))
Using Recursive CTE
In this approach instead of storing the results in a table, I am going to fetch the number of nodes which are related to the parent node as required.
So when we ask for Node 1, then the result should be 8.
Code below:
DECLARE
@Parent
INT
DECLARE
@NodeCount
TABLE
(Parent
INT
)
SET
@Parent=1
;
WITH
CTE
AS
(
SELECT
Member,Parent
FROM
#Tree
WHERE
Member=@Parent
UNION
ALL
SELECT
T.Member, T.Parent
FROM
#Tree T
JOIN
CTE C
ON
T.Parent=C.Member
)
INSERT
@NodeCount
SELECT
C.Parent
FROM
#Tree T
JOIN
CTE C
ON
C.Member=T.Member
WHERE
T.Parent<>0
AND
C.Parent<>@Parent
GROUP BY
C.Parent
HAVING
COUNT
(C.Parent)>1
SELECT
COUNT
(1)
AS
NodeCount
FROM
@NodeCount
I hope this article would have helped those who are trying to find solutions to similar problem. If I have made any mistake please update.