다음을 통해 공유


T-SQL: Graph Based Tables in SQL 2017

Introduction

One of the really fascinated features in SQL 2017 is the support for graph based queries in Transact SQL. This is achieved by introducing a new category of tables called Graph Tables.

In this article we will have a detailed look on what Graph Tables are, their categorization, their usage by taking a popular use case.

Graph Tables – What are they?

Graph tables are new set of objects available in SQL 2017 which can be used for representing graph based scenarios easily. They are based on connected relationship between entities and is much easier to represent many real time scenarios compared to conventional way of storing the data using normalized tables.

The big difference in this case is that data can be retrieved by the graph traversal as against the complex multiple join operations that would have required in the case of conventional normalized tables. Also for many scenarios like social media it’s more intuitive and straight forward to use graph based modelling for easier understanding and querying purpose.

Implementation

Now let’s see how graph based schema is implemented in SQLServer

As you may know already, a graph would have a structure as below

As you see from the above, graph will have two main components- the nodes and edges connecting the various nodes. When you bring this concept to database schema, nodes of the graph will correspond to entities and edges the relationship between them. The entities will be related to one another through the edges which represents the relationships defined between them.

Keeping this concept in mind let’s see a use case and how graph schema representation helps in easy understanding and querying of the data.

Use Case – Forums Database

One of the popular use cases for graph relationships is forums database which consists of a setup like below. For simplicity, this illustration includes minimum number of tables. However in the actual case you would have much more tables involved.

The data model can be represented using the below tables in the conventional approach:

 

As you see from the above image, the model consists of two simple tables to hold Forum Posts and Member details. Two bridge tables are introduced to represent the relationships between the main tables i.e likes given to a post by members as well as replies given by members to a post.

Now let’s see how the same scenario can be handled using graph schema.

As specified previously graph schema uses Node and Edge tables to indicate the entities and their relationships. In this case, we would require two Node tables to represent the entities ForumPosts and ForumMembers. The Likes and Replies relationships are represented by means of two Edge tables.

The tables can be created using the below T-SQL script

CREATE TABLE  [dbo].[ForumMembers](
    [MemberID] [int] IDENTITY(1,1) NOT NULL,
    [MemberName] [varchar](100) NULL
) AS  NODE
GO
 
CREATE TABLE  [dbo].[ForumPosts](
    [PostID] [int] NULL,
    [PostTitle] [varchar](100) NULL,
    [PostBody] [varchar](1000) NULL,
    [OwnerID] [int] NULL
) AS  NODE
GO
 
CREATE TABLE  [dbo].[Likes]
AS EDGE
GO
 
CREATE TABLE  [dbo].[Replies]
AS EDGE
GO

If you analyze the above script you will see that it creates two Node tables for storing the forum posts as well as member details and two Edge tables called likes and replies which indicates the details of members who liked or replied to a particular forum post. The schema looks much simpler and has more clarity in this case compared to the conventional relational DB structure.

The insertion scripts for the above tables will be as follows

INSERT ForumMembers values('Peter'),('ABC'),('PQR'),('George'),('Mary'),('XYZ')
GO
 
INSERT INTO  [dbo].[ForumPosts](
    [PostID],[PostTitle],[PostBody],OwnerID
) 
VALUES
    (1,'Intro','Hi There This is ABC',2),
    (2,'Intro','Hello I''m PQR',3),
    (3,'Re: Intro','Hey PQR This is XYZ',6),
    (4,'Geography','Im George from USA',4),
    (5,'Re:Geography','I''m Mary from OZ',5),
    (6,'Re:Geography','I''m Peter from UK',1),
    (7,'Intro','I''m Peter from UK',1),
    (8,'Intro','nice to see all here!',1)
GO
 
INSERT Replies VALUES
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 6)),
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 1),(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 7)),
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 1),(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 8)),
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 1),(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 2)),
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 5)),
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 2),(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 3))
GO
 
INSERT Likes VALUES
((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),(SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 1))
GO

If we observe the above insert statements we can see that although the insert script for the node tables looks same as that of ordinary table the inserts scripts for the edge tables follow a different pattern

To understand this we need to first understand how data is stored within the Node and Edge tables

For example after insertion of values, the above Node tables will look like this:

select * from ForumMembers
select * from [ForumPosts]
GO

From the above screenshots we can see that Node tables maintain an internal $node_id field which stores a JSON value indicating the node id value along with other metadata details. The other columns will denote the properties of the node which will correspond to our actual data values.

Now if you see the Edge tables they will be as below:

select * from Replies
select * from Likes
GO

In the case of Edge tables the tables will have three columns by default. First column called $edge_id indicates the edge id value in JSON format. The next two columns $from_id and $to_id indicates the nodes connected by the edge. We can add any additional properties as columns in Edge table too. For the simplicity of this illustration we have not added any additional properties.

Now let’s see how we can search for data within the tables

In the conventional method using relational schema, the search would involve a query like below

SELECT p.[PostTitle]
FROM [dbo].[ForumPosts] p
INNER JOIN [dbo].[ForumReplies] r    ON     r.[PostID] = p.PostID
INNER JOIN [dbo].[ForumPosts] p1     ON     p1.[PostID] = r.ReplyPostID
INNER JOIN [dbo].[ForumMembers] m    ON     m.[MemberID] = p1.[OwnerID]
WHERE m.[MemberName] = 'Peter'
GO

For getting the posts to which a member named Peter has replied

Similarly for the getting the posts liked by Mary the query would look like this

SELECT p.[PostTitle]
FROM [dbo].[ForumPosts] p
INNER JOIN [dbo].[ForumLikes] l      ON     r.[PostID]  = p.PostID
INNER JOIN [dbo].[ForumMembers] m    ON     m.[MemberID] = l.MemberID
WHERE m.[MemberName] = 'Mary'
GO

When you analyze these queries you will find that all of them involves multiples join operations with multiples tables to return you the query result.

Now let’s see how the same scenario can be handled in graph schema.

Graph schema has a new function called MATCH which can be used in SQL queries to get search result based on relationship.

Utilizing the MATCH function we can write the query for first scenario as below

;With PostReplies AS (
    SELECT Post1.PostID,Post1.PostTitle,Post2.PostBody,Post2.OwnerID
    FROM dbo.ForumPosts Post1, Replies, dbo.ForumPosts Post2
    WHERE MATCH(Post1-(Replies)->Post2)
)
SELECT c.PostTitle,c.PostBody,m.MemberName
FROM PostReplies c
JOIN dbo.ForumMembers m ON m.MemberID = c.OwnerID
WHERE m.MemberName = 'Peter'
GO

The above query will give you details of all posts to which the member Peter had replied along with the reply post body. If you analyze the query you will see that we have used a CTE based on MATCH clause to get the reply posts and then used a normal join outside to get the membername from ForumMembers table.
This two-step approach is required as MATCH clause based query can’t be merged with JOIN or APPLY operator currently

Now let’s see how query for fetching all the posts liked by Peter would look like

SELECT Post1.PostID, Post1.PostTitle, Member.MemberName
FROM dbo.ForumPosts Post1, Likes, dbo.ForumMembers Member
WHERE
    MATCH(Post1-(Likes)->Member)
    AND MemberName = 'Peter'
GO

As you see from the two queries above, the graph based queries are much simpler compared to their conventional join based counterparts and MATCH function adds much more clarity on the relationship used for parsing the graph to get the desired result.

Now let’s see queries for some more scenarios

-- Members who liked and replied on same post
SELECT Member.MemberName,Post1.PostTitle
FROM dbo.ForumPosts Post1, Replies, dbo.ForumPosts Post2, Likes, dbo.ForumMembers Member
WHERE
    MATCH(Member<-(Likes)-Post1-(Replies)->Post2)
    AND Member.MemberID = Post2.OwnerID
GO

The above query shows the real power of MATCH function.
 In the above case it parses the graph on both directions one based on Replies relationship while other based on Likes relationship. Finally it compares the MemberID of Likes against the Owner for the Reply post to return only cases where same person has liked as well as replied to the same post

We can also include other kinds of logics like aggregation etc. by getting the MATCH based result onto a CTE and then joining them to other tables as in the examples below

-- Members who replied to multiple posts
;With CTE AS(
    SELECT Post1.PostID,Post2.OwnerID
    FROM dbo.ForumPosts Post1, Replies, dbo.ForumPosts Post2
    WHERE MATCH(Post1-(Replies)->Post2)
)
SELECT m.MemberName
FROM CTE c
JOIN dbo.ForumMembers m ON m.MemberID = c.OwnerID
GROUP BY  m.MemberName
HAVING COUNT(DISTINCT c.PostID) > 1
GO
 
 
-- Members who replied multiple times to same post
;With CTE AS(
    SELECT Post1.PostID,Post1.PostTitle,Post2.OwnerID
    FROM dbo.ForumPosts Post1, Replies, dbo.ForumPosts Post2
    WHERE MATCH(Post1-(Replies)->Post2)
)
SELECT m.MemberName,c.PostTitle
FROM CTE c
JOIN dbo.ForumMembers m ON m.MemberID = c.OwnerID
GROUP BY  m.MemberName,c.PostTitle
HAVING COUNT(*) > 1
GO

There are many other use cases for which graph based tables and queries would prove advantageous. Some of them include cases like below:

  • Represent hierarchies which grows to multiple directions with inter linking relationships
  • Many to many relationships between tables as in the above case
  • Any scenario where there are relationships between involved entities

Conclusion

As seen from the illustrations above Graph tables are one of the very exciting features that came with SQLServer 2017. The above examples are all based on CTP 2. The actual implementation may vary slightly when product gets to public release.

See Also