Udostępnij za pośrednictwem


PDC 2008 Conference Scheduling Using a Genetic Algorithm

If you read my What Do I Do post, you'll know that I'm the Content Owner for this year's PDC 2008 in Los Angeles. MIX08 is behind us, and I've just recently transitioned away from my Web GO role. This means that I can now focus 100% of my time and attention on our October event. It's going to be fantastic!

One of the many responsibilities I have as Content Owner is to create the master schedule for the event. This is the schedule that tells you which session is in which room and at what time. For PDC 2005, we delivered over 200 sessions at the conference, not including repeats (we run repeats of popular sessions that are filled to capacity).

Because PDC is where we talk about the future of Microsoft's platform, all of the content relates in some way to our overall strategy (which is typically delivered during big keynotes and general sessions). This means that some sessions need to be scheduled ahead of others to provide foundational and prerequisite knowledge. For example, a 200-level (intermediate) session covering a specific topic should precede a 300-level (advanced) or 400-level deep-dive. The following diagram is a simplification, but it illustrates my point.

Here are some additional constraints that must be considered when creating a master schedule:

  • Only one session can be presented per room during any given time slot.
  • A speaker can only present one session during any given time slot (i.e. can't be in two places at the same time).
  • A speaker may only be available on specific days.
  • A session may require audio/video equipment that is only present in specific rooms.
  • Popular sessions should be scheduled in larger rooms.

To tackle the scheduling problem, sessions are typically grouped into tracks. These tracks are then scheduled in parallel, almost like mini-conferences running alongside each other. While this can be effective, many studies show that conference attendees don't attend tracks; they attend sessions. In other words, it's relatively rare for someone to sit through all of the sessions in a single track. An attendee is much more likely to hop from track to track to attend the sessions they're interested in.

Because of this track-hopping behavior, post-event surveys often reflect the inability of people to attend all of their favorite sessions. This is almost always due to a conflict where two or more desired sessions share the same time slot, and an attendee is forced to pick only one. For our MIX07 and MIX08 events, we tried to mitigate this undesirable outcome by publishing the session recordings within 24 hours of their completion (we actually averaged under 12 hours in both cases). This helps, but it's not a panacea.

In an ideal world, conference participants would be able to attend all of their favorite sessions. We almost always do a pre-event survey asking people to pick the sessions they'd like to attend, and we use that data to extrapolate expected room capacities for scheduling. For PDC, where we announce many brand new technologies, we have the additional challenge of making educated guesses about how many people will want to attend sessions that aren't revealed until the first day of the conference. It's an inexact science, to be sure.

For PDC 2008, we're going to try something brand new. In my spare time, I've been working on a genetic algorithm (you may want to review this article to understand some upcoming terms) that takes all of the above factors into account, including a couple more:

  • The total walking distance required for a participant to attend all of their favorite sessions. This distance can be significant in the Los Angeles Convention Center. The algorithm prefers shorter routes.
  • The degree to which room capacities are "balanced." This means that—on average—the algorithm prefers schedules that leave relatively equal space in each room. Otherwise, one room may be near 100% capacity while another is only at 25%.

In my version of the algorithm, each solution in the population represents a conference schedule. The fitness function takes all of the aforementioned factors into account, and penalizes solutions with undesirable attributes. At the end of each generation, an elite group of solutions is retained, and the remainder are subject to both crossover and mutation. Generally speaking, optimal solutions are discovered in under 500 total generations.

The end result is that most conference attendees should be able to attend most of their desired sessions, all without a rigid track structure.

Update: Channel 9 has published a 32-minute video interview of me discussing this technique.

Comments

  • Anonymous
    May 03, 2008
    If the genetic algorithm doesn't work out, try simulated annealing. It is very good for this kind of problem :) Man, I really wish I can make it to PDC. With Live Mesh announced and Silverlight, this is going to be exciting.

  • Anonymous
    May 04, 2008
    Thanks for the tip, Florian. Although I didn't mention it in the article, I did experiment with simulated annealing, and it's still part of the project. I'm about to do some testing with a much larger dataset, and my guess is that I'll start to see some differences in the two algorithms soon.

  • Anonymous
    May 04, 2008
    Interesting! I'm curious how the differences will turn out in terms of compute time and result.

  • Anonymous
    May 05, 2008
    Well, that just about wraps up my "Is there anything Mike Swanson can't do?" list. (scratches off "#32 - Applied Genetics"). Only two to go... "#57 - Paragliding" and "#138 - Surfing." I expect these will be on your FY09 commitments.

  • Anonymous
    May 05, 2008
    Hey Mike: That sounds like a fun challenge, although I like your use of the word 'most' at the end of the post. Years ago, as my senior project in college, I tried writing a scheduling application (in LISP of all things) that would create the class schedule for the Computer Science department for an upcoming quarter. After working on it for a while, I learned about NP-Complete problems. Man. So, at the end, I found solice in the fact that I was able to automatically schedule most of the classes. I'm eager to see how your scheduling turns out!

  • Anonymous
    May 05, 2008
    I love it!  You can always introduce a random number generator in there if it helps...  :-)

  • Anonymous
    May 08, 2008
    How compute intensive do you expect this to be? We've had several customers use our .NET-based grid computing framework to run genetic algorithms across small and large grids alike. Let me know (at dan (at) digipede (dot) net if you're interested). MSR has a grid they may be willing to share if you need it...

  • Anonymous
    May 21, 2008
    Cool stuff.  What does your fitness function look like?  Seems to me that if you're factoring into your score each attendee's ability to attend all their favourite sessions, then you need for each possible solution some clever algorithm to see how well that can be done.  For example, if one session does occur more than once during the day, then there are multiple options for the attendee (for that solution).  Does that mean there are then multiple options for each attendee+solution you must score?

  • Anonymous
    October 01, 2008
    I have an internal PDC2008 gadget running in my Windows Sidebar, and it reports that there are only 25

  • Anonymous
    October 06, 2008
    Today was our first day of PDC2008 dry-runs on campus. To present a session at the conference, all speakers

  • Anonymous
    October 08, 2008
    La liste des sessions de la conférence Microsoft 2008 pour les développeurs et architectes est au 2/3

  • Anonymous
    October 28, 2008
    The comment has been removed