共用方式為


F# and the PFX Round 1

I’m currently working on a chapter for Programming F# titled Asynchronous and Parallel Programming which should turn out pretty well. Anyways, I have an example that was too awesome not to blog.

In this post I’ll show how to use the Parallel Extensions to the .NET Framework for finding the shortest path between two nodes on a graph.

Problem Description

Let’s say you are planning a road trip this summer, you will start in city X and travel to city Y. Since your car cannot fly, you cannot head straight to city Y. Instead, you need to find the shortest path from X to Y. (With bonus points for site seeing, like the World’s Largest Rocking Chair in Iowa)

CS Gangstas have studied this problem and gave it the banal name ‘Shortest path problem’. If you do your research you can find plenty of clever and efficient algorithms; in this blog post I’ll just be solving this problem using a simple approach.

The Algorithm

image

The simplest way to solve this problem is create a dictionary of ‘shortest known path to city X’. Then, keep track of the cities you would like to visit and the distance traveled so far on that route.

If you are at a city and the distance you have traveled so far exceeds the shortest known path, then return because you already know a shorter route to that city.

Otherwise, update your ‘shortest known path’ data structure, and visit all of the city’s neighbors and repeat the process. Once you have exhaustively tried all paths the resulting ‘shortest known path’ data structure will contain the shortest path from your starting city to your final destination.

Lame Solution in C#

Let’s try to quickly write this up in C#.

     enum City
    {
        LosAngeles,
        Seattle, 
        // [List cities here]
        Boston,
        NewYork
    }

    /// <summary>
    /// Initialize the routes between cities and the distances. 
    /// Example: int distTweenBostonAndChicago = x[Boston][Chicago];
    /// </summary>
    static void InitializeDistances(Dictionary<City, Dictionary<City, int>> dist)
    {
        // [Load your data here]
        return new Dictionary<City, Dictionary<City, int>>();
    }

    /// <summary>
    /// Object to represent a route to a given city.
    /// </summary>
    class Route
    {
        public Route()
        {
            DistanceTraveled = 0;
            Route = new List<City>();
        }
        /// <summary>
        /// Add a new leg to an existing route.
        /// </summary>
        public Route(Route r, City nextCity, int distance)
        {
            DistanceTraveled = r.DistanceTraveled + distance;
            Route = new List<City>(r.Route);
            Route.Add(nextCity);
        }
        public int DistanceTraveled { get; set; }
        public List<City> Route { get; set; }
    }

    /// <summary>
    /// This makes up for C# not having Tuples...
    /// </summary>
    class CityToVisit
    {
        public CityToVisit(City c, Route r)
        {
            this.City = c;
            this.Route = r;
        }
        public City City { get; set; }
        public Route Route { get; set; }
    }

    static void Main(string[] args)
    {
        var distTweenCities = new Dictionary<City, Dictionary<City, int>>();
        InitializeDistances(distTweenCities);

        var StartingCity = City.LosAngeles;
        var FinalDestination = City.NewYork;

        // Keep track of the shorest paths to get from the starting city to
        // any other reachable city.
        var shortestPaths = new Dictionary<City, Route>();

        // Initialize our 'cities to visit' list, begining at our start city having
        // traveled zero miles.
        var citiesToVisit = new Stack<CityToVisit>();
        citiesToVisit.Push(new CityToVisit(City.LosAngeles, new Route());


        while (citiesToVisit.Count > 0)
        {
            var curLoc = citiesToVisit.Pop();

            // Have we visited this city before? If not, then add it...
            if (!shortestPaths.ContainsKey(curLoc.City))
                shortestPaths.Add(curLoc.City, nextCity.Route);

            // Is the route we used to get here shorter than the best known?
            if (shortestPaths[curLoc.City] < curLoc.Route.DistanceTraveled)
            {
                // Update our shortest paths...
                shortestPaths[curLoc.City] = nextCity.Route;

                // ... and visit all of its neighbors using this shortest path
                foreach (var neighboringCity in distTweenCities[curLoc.City].Keys)
                {
                    var updatedRoute = new Route(curLoc.Route, 
                                                 neighboringCity, 
                                                 distTweenCities[curLoc.City][neighboringCity];
                    var cityToVisit = new CityToVisit(neighboringCity, updatedRoute); 
                    
                    citiesToVisit.Push(cityToVisit);
                }
            }
        }

        Console.WriteLine("Shorest path from {0} to {1} is {2} miles.",
                          StartingCity, FinalDestination, 
                          shortestPaths[StartingCity][FinalDestination]);
    }
}

Awesome Solution in F#

So while the above code solves the problem, it suffers two major setbacks. First, the algorithm behaves on order O(N^2), which means it is slow. This wouldn’t be as much of a problem if it weren’t for problem two: the implementation is single threaded.

Being either slow or being single threaded is OK, but both at the same time is bad news.

Fortunately in CLR 4.0 (and available on 3.5 in CTP form) the Parallel Extensions to the .NET Framework (PFX) make parallelizing code pretty simple.

It is pretty clear how you to execute our algorithm in parallel, simple parallelize the process of visiting each city. The problem however comes when you are trying to read or write from the shorestPaths dictionary. Having multiple threads read and write from the same data structure at the same time is a recipe for disaster.

However, the PFX contain some built-in collection types that are designed exactly for this sort of application. The PFX ConcurrentDictionary type acts and behaves just like a normal dictionary, except that you can safely use it in a concurrent environment. (I know that sounds really vague, I’ll write more about the benefits of the ConcurrentDictionary type in a later post.)

Pedant’s Note: The solution locks on STDOUT so output messages don’t spew nonsense to the output; however this effectively makes the program run synchronously.

 open System
open System.Threading.Tasks
open System.Collections.Generic
open System.Collections.Concurrent
    
type City = 
    | Boise     | LosAngeles    | NewYork   | Seattle
    | StLouis   | Phoenix       | Boston    | Chicago   
    | Denver
    
// Known disances between US cities in miles. If there is a path from A to B,
// there is also a path from B to A. If no path is listed, then you cannot travel
// between the two cities.
let distTweenCities = 
    let distances = new Dictionary<City, Dictionary<City, int>>()
    
    [
        (Boise,   Seattle, 496);      (Boise,   Denver,  830);    
        (Boise,   Chicago, 1702);     (Seattle, LosAngeles, 1141); 
        (Seattle, Denver,  1321);     (LosAngeles, Denver,  1022);  
        (LosAngeles, Phoenix, 371);   (Phoenix, Denver,  809);      
        (Phoenix, StLouis, 1504);     (Denver,  StLouis, 8588);     
        (Denver,  Chicago, 1009);     (Chicago, NewYork, 811);     
        (Chicago, Boston,  986);      (StLouis, Chicago, 300);
        (Boston, StLouis,  986);      (NewYork, Boston,  211)
    ]
    // Convert the list of links between cities into a dictionary
    |> List.iter (fun (cityA, cityB, dist) -> 
        
        if not <| distances.ContainsKey(cityA) then
            distances.Add(cityA, new Dictionary<City, int>())
        if not <| distances.ContainsKey(cityB) then
            distances.Add(cityB, new Dictionary<City, int>())
            
        distances.[cityA].Add(cityB, dist)
        distances.[cityB].Add(cityA, dist))
        
    // Return our dictionary of dictionaries of distances
    distances

let stdout = Console.Out

let shortestPathBetween startingCity finalDestination =
    
    // Keep track of the shorest path from 'startCity' to a given city, and
    // the path used to get to that city.
    let shortestPaths = new ConcurrentDictionary<City, int * City list>()
    
    let rec searchForShortestPath curCity distanceSoFar citiesVisitedSoFar = 

        // Visit all available cities from the current city
        let visitAvailableDestinations() =
            
            let availableDestinations = distTweenCities.[curCity]
            
            // Loop through destinations and spawn new tasks
            for dest in availableDestinations.Keys do
                Task.Factory.StartNew(
                    new Action(fun () -> 
                        searchForShortestPath 
                            dest 
                            (distTweenCities.[curCity].[dest] + distanceSoFar) 
                            (citiesVisitedSoFar @ [dest])
                )) |> ignore

        // Have I already found a way to travel to this city?
        let visitedBefore = shortestPaths.ContainsKey(curCity)
        
        if not visitedBefore then
            // First time visiting this city, add to our 'visited cities'
            shortestPaths.TryAdd(curCity, (distanceSoFar, citiesVisitedSoFar))
            
            lock stdout 
                 (fun () -> printfn 
                                "Origional route to %A (%d) via: %A" 
                                curCity distanceSoFar citiesVisitedSoFar)

            
            visitAvailableDestinations()
        else // We have visited this city before, let's see if this route is faster
            let shortestKnownPath, cities = shortestPaths.[curCity]
            
            if distanceSoFar < shortestKnownPath then
                // Update shortest path, revisit neighboring cities
                shortestPaths.[curCity] <- (distanceSoFar, citiesVisitedSoFar)
                
                lock stdout 
                     (fun () -> printfn 
                                    "Found shorter route to %A (%d) via:  %A" 
                                    curCity distanceSoFar citiesVisitedSoFar)
                
                visitAvailableDestinations()
            else // Ignore, we have already found a faster way to get here
                ()

    // Create the master task to find the shortest path between the two cities
    let t = 
        Task.Factory.StartNew(
            new Action(fun () -> searchForShortestPath startingCity 0 [])
        )
    t.Wait()
    
    let dist, path = shortestPaths.[finalDestination]
    
    printfn 
        "The shortest distance from %A to %A is %d miles, with route:\n%A" 
        startingCity finalDestination 
        dist path

Comments

  • Anonymous
    April 08, 2009
    You know I have been sitting under your learning tree and trying to reinvent myself as a programmer(I been around for some 10 years so you can guess my age) I just thought this is the right time to delurk and say what had to be said - I absolutely, totally completely envy you while simultaneously being happy for you. :) You are working on great technology, with some incredibly smart people on stuff that matters and will affect millions of people, will soon become a well known personality, AND a prime reason you are in that position is because you were smart and humble enough to accept being  a "mere" tester. :) :) Your momma ought to be proud of you. ;)

  • Anonymous
    April 08, 2009
    Of course, you could also make use of anonymous types, LINQ, and PFX in C#, and make the comparison a bit more on the level :)

  • Anonymous
    April 08, 2009
    The comment has been removed

  • Anonymous
    April 08, 2009
    of course this would be even more impressive if you would implement a complete A*-algorithm (no need to check all "next-destination" choices if you guess the remaining-distance "right" and sort accordingly) ;) But nicely done. Will the PFX be implemented for the async-workflow?

  • Anonymous
    April 09, 2009
    Good idea about implementing A*, I'll add that to my list of future blog topics. As for PFX being implemented for async-workflows, we are currently looking at how these two APIs can better play together. In the next CTP release (due between 'soonish' and 'eventually') we'll have a way to convert an Async<'T> to a Task<'T>, but we'd welcome any feedback for what people would like to see in this space. Thanks!

  • Anonymous
    April 09, 2009
    Thank you for submitting this cool story - Trackback from DotNetShoutout

  • Anonymous
    April 09, 2009
    The comment has been removed

  • Anonymous
    April 13, 2009
    In my recent adventures around Collective Intelligence , I took many code samples that have been written

  • Anonymous
    April 13, 2009
    In my recent adventures around Collective Intelligence , I took many code samples that have been written

  • Anonymous
    April 20, 2009
    As part of my dive into the Collective Intelligence series , I’ve found myself many times taking code

  • Anonymous
    April 20, 2009
    As part of my dive into the Collective Intelligence series , I’ve found myself many times taking code