SHORTEST_PATH (Transact-SQL)

Applies to: SQL Server 2019 (15.x) and later versions Azure SQL Database Azure SQL Managed Instance SQL database in Microsoft Fabric

Specifies a search condition for a graph, which is searched recursively or repetitively. SHORTEST_PATH can be used inside MATCH with graph node and edge tables, in the SELECT statement.

Transact-SQL syntax conventions

Shortest path

The SHORTEST_PATH function lets you find:

  • A shortest path between two given nodes/entities
  • Single source shortest path(s).
  • Shortest path from multiple source nodes to multiple target nodes.

It takes an arbitrary length pattern as input and returns a shortest path that exists between two nodes. This function can only be used inside MATCH. The function returns only one shortest path between any two given nodes. If there exist, two or more shortest paths of the same length between any pair of source and destination node(s), the function returns only one path that was found first during traversal. An arbitrary length pattern can only be specified inside a SHORTEST_PATH function.

For complete syntax, refer to MATCH (SQL Graph).

FOR PATH

FOR PATH must be used with any node or edge table name in the FROM clause, which participates in an arbitrary length pattern. FOR PATH tells the engine that the node or edge table returns an ordered collection representing the list of nodes or edges found along the path traversed. The attributes from these tables can't be projected directly in the SELECT clause. To project attributes from these tables, graph path aggregate functions must be used.

Arbitrary length pattern

This pattern includes the nodes and edges that must be traversed repeatedly until either:

  • The desired node is reached.
  • The maximum number of iterations as specified in the pattern is met.

The following two pattern quantifiers are supported:

  • '+': Repeat the pattern 1 or more times. Terminate as soon as a shortest path is found.
  • {1,n}: Repeat the pattern 1 to n times. Terminate as soon as a shortest is found.

LAST_NODE

LAST_NODE() function allows chaining of two arbitrary length traversal patterns. It can be used in scenarios where:

  • More than one shortest path patterns are used in a query and one pattern begins at the LAST node of the previous pattern.
  • Two shortest path patterns merge at the same LAST_NODE().

Graph path order

Graph path order refers to the order of data in the output path. The output path order always starts at the nonrecursive part of the pattern followed by the nodes/edges that appear in the recursive part. The order in which the graph is traversed during query optimization/execution has nothing to do with the order printed in the output. Similarly, the direction of arrow in the recursive pattern also doesn't affect the graph path order.

Graph path aggregate functions

Since the nodes and edges involved in arbitrary length pattern return a collection (of node(s) and edge(s) traversed in that path), users can't project the attributes directly using the conventional tablename.attributename syntax. For queries where it's required to project attribute values from the intermediate node or edge tables, in the path traversed, use following graph path aggregate functions: STRING_AGG, LAST_VALUE, SUM, AVG, MIN, MAX and COUNT. The general syntax to use these aggregate functions in the SELECT clause is:

<GRAPH_PATH_AGGREGATE_FUNCTION>(<expression> , <separator>)  <order_clause>

    <order_clause> ::=
        { WITHIN GROUP (GRAPH PATH) }

    <GRAPH_PATH_AGGREGATE_FUNCTION> ::=
          STRING_AGG
        | LAST_VALUE
        | SUM
        | COUNT
        | AVG
         | MIN
        | MAX

STRING_AGG

The STRING_AGG function takes an expression and separator as input and returns a string. Users can use this function in the SELECT clause to project attributes from the intermediate nodes or edges in the path traversed.

LAST_VALUE

To project attributes from the last node of path traversed, LAST_VALUE aggregate function can be used. It's an error to provide edge table alias as an input to this function, only node table names or aliases can be used.

Last Node: The last node refers to the node that appears last in the path traversed, irrespective of the direction of arrow in the MATCH predicate. For example: MATCH(SHORTEST_PATH(n(-(e)->p)+) ). Here the last node in the path is the last visited P node.

In the MATCH(SHORTEST_PATH((n<-(e)-)+p)) pattern, the last node is the last N node visited.

SUM

This function returns the sum of provided node/edge attribute values or expression that appeared in the traversed path.

COUNT

This function returns the number of non-null values of the specified node/edge attribute in the path. The COUNT function doesn't support the * operator - attempted usage of * results in a syntax error.

{  COUNT( <expression> )  <order_clause>  }

AVG

Returns the average of provided node/edge attribute values or expression that appeared in the traversed path.

MIN

Returns the minimum value from the provided node/edge attribute values or expression that appeared in the traversed path.

MAX

Returns the maximum value from the provided node/edge attribute values or expression that appeared in the traversed path.

Remarks

  • The SHORTEST_PATH function can only be used inside MATCH.
  • The LAST_NODE function is only supported inside SHORTEST_PATH.
  • The SHORTEST_PATH function returns any one shortest path between nodes. It currently doesn't support returning all shortest paths between nodes; it also doesn't support returning all paths between nodes.
  • The SHORTEST_PATH implementation finds an unweighted shortest path.
  • In some cases, bad plans may be generated for queries with higher number of hops, which results in higher query execution times. Evaluate if query hints like OPTION (HASH JOIN) and / or OPTION (MAXDOP 1) help.

Examples

For the example queries shown here, we use the node and edge tables created in SQL Graph sample

A. Find shortest path between two people

In the following example, we find shortest path between Jacob and Alice. We need the Person node and friendOf edge created from SQL Graph sample.

SELECT PersonName, Friends
FROM (
    SELECT
        Person1.name AS PersonName,
        STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
        LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS LastNode
    FROM
        Person AS Person1,
        friendOf FOR PATH AS fo,
        Person FOR PATH  AS Person2
    WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2)+))
    AND Person1.name = 'Jacob'
) AS Q
WHERE Q.LastNode = 'Alice'

B. Find shortest path from a given node to all other nodes in the graph.

The following example finds all the people that Jacob is connected to in the graph and the shortest path starting from Jacob to all those people.

SELECT
    Person1.name AS PersonName,
    STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends
FROM
    Person AS Person1,
    friendOf FOR PATH AS fo,
    Person FOR PATH  AS Person2
WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2)+))
AND Person1.name = 'Jacob'

C. Count the number of hops/levels traversed to go from one person to another in the graph.

The following example finds the shortest path between Jacob and Alice and prints the number of hops it takes to go from Jacob to Alice.

 SELECT PersonName, Friends, levels
FROM (
    SELECT
        Person1.name AS PersonName,
        STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
        LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS LastNode,
        COUNT(Person2.name) WITHIN GROUP (GRAPH PATH) AS levels
    FROM
        Person AS Person1,
        friendOf FOR PATH AS fo,
        Person FOR PATH  AS Person2
    WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2)+))
    AND Person1.name = 'Jacob'
) AS Q
WHERE Q.LastNode = 'Alice'

D. Find people 1-3 hops away from a given person

The following example finds the shortest path between Jacob and all the people that Jacob is connected to in the graph one to three hops away from him.

SELECT
    Person1.name AS PersonName,
    STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends
FROM
    Person AS Person1,
    friendOf FOR PATH AS fo,
    Person FOR PATH  AS Person2
WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2){1,3}))
AND Person1.name = 'Jacob'

E. Find people exactly two hops away from a given person

The following example finds the shortest path between Jacob and people who are exactly two hops away from him in the graph.

SELECT PersonName, Friends
FROM (
    SELECT
        Person1.name AS PersonName,
        STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
        COUNT(Person2.name) WITHIN GROUP (GRAPH PATH) AS levels
    FROM
        Person AS Person1,
        friendOf FOR PATH AS fo,
        Person FOR PATH  AS Person2
    WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2){1,3}))
    AND Person1.name = 'Jacob'
) Q
WHERE Q.levels = 2

F. Find people 1-3 hops away from a given person, who also like a specific restaurant

The following example finds the shortest path between Jacob and all the people that he's connected to in the graph 1-3 hops away from him. The query also filters connected people by their liking a given restaurant. In the below sample, that LAST_NODE(Person2) returns the final node for each shortest path. The "last" Person node obtained from LAST_NODE can then be "chained" for further traversals to find the restaurant(s) that they like.

SELECT
    Person1.name AS PersonName,
    STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
    Restaurant.name
FROM
    Person AS Person1,
    friendOf FOR PATH AS fo,
    Person FOR PATH  AS Person2,
    likes,
    Restaurant
WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2){1,3}) AND LAST_NODE(Person2)-(likes)->Restaurant )
AND Person1.name = 'Jacob'
AND Restaurant.name = 'Ginger and Spice'

G. Find the shortest path from a given node to all other nodes in the graph, excluding "loops"

The following example finds all the people that Alice is connected to in the graph and the shortest path starting from Alice to all those people. The example explicitly checks for "loops" where the start and the end node of the path happen to be the same.

SELECT PersonName, Friends
FROM (
    SELECT
        Person1.name AS PersonName,
        STRING_AGG(Person2.name, '->') WITHIN GROUP (GRAPH PATH) AS Friends,
        LAST_VALUE(Person2.name) WITHIN GROUP (GRAPH PATH) AS LastNode
    FROM
        Person AS Person1,
        friendOf FOR PATH AS fo,
        Person FOR PATH  AS Person2
    WHERE MATCH(SHORTEST_PATH(Person1(-(fo)->Person2)+))
    AND Person1.name = 'Alice'
) AS Q
WHERE Q.LastNode != 'Alice'

Next steps