Parallel builds scenarios and implementation ideas
The MSBuild Team has started thinking about adding multi-proc support to the MSBuild engine. Currently MSBuild is single-threaded and does not take advantage of any opportunities for parallel processing during a build. However, most builds inherently have chunks of work that can be done concurrently. By parallelizing these independent chunks of work we aim to reduce the total build time for all builds, large and small.
To enable parallel builds, we believe we need to support the following scenarios:
- Ability to build a large tree of projects in parallel – we call this the “build lab” scenario. In this scenario multiple projects are built concurrently.
- Ability to build a Visual Studio solution in parallel. Solutions in Visual Studio can contain multiple projects, and even for simple solutions we should be able to support parallel builds. In this scenario, the independent projects are built concurrently, while the dependent ones are serialized.
- Ability to build multiple files in a project concurrently. Some projects have lots of files that need to be operated on by a single tool one-by-one. In this scenario, the tool is invoked concurrently on all the files.
We currently believe that scenarios (1) and (2) would provide the “biggest bang for the buck” in build performance. However, MSBuild is a general build orchestration engine and we know some of our customers build things other than source code. For many of these customers, supporting (3) would offer them significant benefits. This is particularly the case if the build calls into older tools that don’t support multi-processor machines natively. As a result, we believe we also need to look into supporting the parallelization of tasks within a build.
In addition to thinking about scenarios, we're working on implementation designs (as you can see from the above picture!). On the implementation side the build mechanism we are considering is completely declarative. The MSBuild engine will not make any attempt to determine what can be parallelized. Instead, we will instead introduce parallelization “constructs” in the file format that will allow a project or targets file author to indicate what can and cannot be parallelized.
For Visual Studio solutions, the parallelization will be automatic in a sense. The target files we will ship will allow all independent projects in a solution to be parallelized. As long as the dependencies are correctly expressed through project-to-project references, or through the “Project Dependencies” dialog, the build will parallelize correctly.
Now is your chance to give us feedback! What do your builds look like? How would the above three points of parallelization improve your build performance? What do you think about our plans to introduce new construcuts? Let us know by replying to this post, or send us email at msbuild@microsoft.com.
[ Author: Sumedh Kanetkar ]
Comments
- Anonymous
February 25, 2006
the highest concurrency is obtained from (1) & (3), (3) also has the most compatibility with legacy tools. If you can look at electric cloud, DSEE, ClearCase for examples of how to build on multiple nodes on a LAN - Anonymous
July 13, 2006
Yes. I have recently implemented a similar solution in my project, where I conduct the build of multiple projects in parallel.
This reduced the overall build time significantly..
I also thought on similar lines, and separated the dependent and independent projects. Everything is configurable via XML files.
If interested in further details please mail me at
khare.rajat@gmail.com - Anonymous
October 27, 2006
I don't really consider anything other than #3 to be a parallel build. Look at electric cloud and clearmake. They can handle builds parallelized down to file level not only on one machine but distributed as well (at least on unix). Gmake can also handle parallel builds fairly well. I have worked in projects where we had build times of 8+ hours (on a fast single CPU machine) without parallelization. C&C++ builds can scale nearly linearly with the number of CPUs or CPU cores you have, until I/O is saturated, also for single project builds. Having a good tool capable of at least parallel builds according to your specification #3 would be required in order to really use the potential in current and coming multi core CPUs. Good distributed build support would be even better and could potentially save lots of money on both hardware and build times in large, complex projects. Electric cloud looks like the best available soultion to me, if it weren't so infernally expensive :) But, if you pull of parallel builds at least on the same level as gmake -jnn I'd be happy. To me, this is one of the greatest weaknesses in VS.