Udostępnij za pośrednictwem


Scale-out computing on DevLabs

Today we're launching several new Technical Computing (TC) projects on DevLabs. These projects give you a chance to learn about some of the technologies being developed as part of the Technical Computing initiative, to gain early access to code, and to provide feedback for several TC-related innovative projects.

Last May, I blogged about the Technical Computing initiative at Microsoft, an initiative that's leading to technologies which will empower the world's most important problem solvers to best utilize computing resources. These domain specialists often either develop code themselves as a necessary aspect of their work, or they rely on other developers to build the software that makes their work possible. The TC initiative gives those developers and domain specialists ground-breaking developer tools and infrastructure to do their best work.

The TC initiative has made some important first steps since its inception. Visual Studio 2010 includes built-in support for developing, debugging, and tuning multi-core and manycore applications and has seen impressive adoption within a wide-variety of industries and domains. In November, we announced Service Pack 1 for HPC Server 2008 R2, which integrates Windows Azure compute cycles, allowing massively parallel applications to easily scale from the cluster to the cloud. And this is just the beginning. The teams involved in the TC initiative are working hard on impressive new solutions to bring all that modern and future computing has to offer to developers, domain specialists, and IT professionals alike.

Today's new TC projects take the next steps in this journey.

TPL Dataflow - Enabling parallel and concurrent .NET applications

.NET 4 saw the introduction of the Task Parallel Library (TPL), parallel loops, concurrent data structures, Parallel LINQ (PLINQ), and more, all of which were collectively referred to as Parallel Extensions to the .NET Framework. TPL Dataflow is a new member of that family, layering on top of tasks, concurrent collections, and more to enable the development of powerful and efficient .NET-based concurrent systems built using dataflow concepts. The technology relies on techniques based on in-process message passing and asynchronous pipelines and is heavily inspired by the Visual C++ 2010 Asynchronous Agents Library and DevLab's Axum language. TPL Dataflow provides solutions for buffering and processing data, building systems that need high-throughput and low-latency processing of data, and building agent/actor-based systems. TPL Dataflow was also designed to smoothly integrate with the new asynchronous language functionality in C# and Visual Basic I previously blogged about.

Below, you can see an example of an "agent" using dataflow blocks in C# to safely, asynchronously, and efficiently process incoming requests.

Dryad - Supporting data-intensive computing applications

Pioneered in Microsoft Research, Dryad, DSC, and DryadLINQ are a set of technologies that support data-intensive computing applications on Windows HPC Server 2008 R2 Service Pack 1. These technologies enable efficient processing of large volumes of data in many types of applications, including data-mining applications, image and stream processing, and various kinds of intense scientific computations. Dryad and DSC run on the cluster to support data-intensive computing and manage data that is partitioned across the cluster, while DryadLINQ allows developers to build data- and compute-intensive .NET applications using the familiar LINQ programming model.

Here you can see the code to loading textual log data using Dryad. That data is merged and processed on a cluster, and then the results are streamed back to the client for display.

public static IEnumerable<string> GeoIp(string logStream, string geoStream)
{
DistributedData<string> logLinesTable = DistributedData.OpenAsText(logStream);
DistributedData<string> geoIpTable = DistributedData.OpenAsText(geoStream);

// Join the two tables on the common key (IP Address)
IEnumerable<string> joined = logLinesTable.Join(geoIpTable,
l1 => l1.Split(' ').First(),
l2 => l2.Split(' ').First(),
(l1, l2) => l2).AsEnumerable();

return joined;
}

public static void Main()
{
// Load log and geo data into DSC
Console.WriteLine("Loading data");
File.ReadLines("log.txt").AsDistributed().ExecuteAsText("hpcdsc://localhost/Samples/log");
File.ReadLines("geo.txt").AsDistributed().ExecuteAsText("hpcdsc://localhost/Samples/geo");

// Run the query
Console.WriteLine("Running query");
IEnumerable<string> results =
GeoIp("hpcdsc://localhost/Samples/log", "hpcdsc://localhost/Samples/geo");

// Print out the results
Console.WriteLine("Displaying results");

foreach (var entry in results) Console.WriteLine(entry);

}

Sho - Putting the power of data analysis flexible prototyping in your hands

Also begun in Microsoft Research, Sho provides those who are working on technical computing workloads an interactive environment for data analysis and scientific computing. It lets you seamlessly connect scripts written in IronPython with .NET libraries, enabling fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra and data visualization, both of which can be used from any .NET language, as well as a feature-rich interactive shell for rapid development. Sho comes with packages for large-scale parallel computing (via Windows HPC Server and Windows Azure), statistics, and optimization, as well as an extensible package mechanism that makes it easy for you to create and share your own packages.

As you can see in the below screenshot, Sho provides an interactive REPL (read/execute/print loop) that allows you to write code and see results textually and graphically immediately.

Try Them Out

Our goal moving forward is to add additional Technical Computing projects in pre-beta states to DevLabs in order to get your early feedback and insight and to help drive these technologies in the right direction. We look forward to hearing from you.

Namaste!

Comments

  • Anonymous
    January 26, 2011
    This is exciting---and it's great to see that Microsoft is supporting foundational efforts like Software Carpentry (http://software-carpentry.org) to ensure that scientists and engineers have the basic skills they need before they try tackling things with multi-core, cloud, and peta-scale in their names.  After all, if someone doesn't know how to drive a family car, giving them keys to a transport truck and telling them to take it out on the highway is just an invitation to crash.

  • Anonymous
    January 26, 2011
    The comment has been removed

  • Anonymous
    January 27, 2011
    Microsoft needs to round out its Technical Computing Initiative with a few more .NET APIs

  1. a managed wrapper for GPGPU DirectCompute  - what happened to Accelerator?
  2. A rich maths library - like NMath or ExtremeOptimization - we are constantly reimplementing Vector and Matrix classes
  3. An extensible managed Data Mining model API - to create SQL Server Analysis Services algorithms requires in depth COM dev
  4. a machine learning API - like Encog - what happened to MS research Infer.NET?
  5. a numerical analysis API
  • Anonymous
    January 27, 2011
    The comment has been removed

  • Anonymous
    January 28, 2011
    Jim, I encourage you to try out Sho. It includes vector and matrix libraries as well as libraries for statistics and machine learning. It also includes Solver Foundation, a .Net based library for optimization. I just posted about Sho and Solver Foundation on my blog: blogs.msdn.com/.../optimization-modeling-using-solver-foundation-and-sho.aspx. Best regards, Nathan

  • Anonymous
    January 28, 2011
    [Sorry, that last post was in response to Josh, not Jim ;)]

  • Anonymous
    January 28, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    Great and very interesting science computing with .Net technology especially with C# , cool. :)

  • Anonymous
    February 15, 2011
    The comment has been removed

  • Anonymous
    March 03, 2011
    The comment has been removed

  • Anonymous
    March 03, 2011
    I always have to hit 2 times the post button to post a message to the blog. Seems to be bug.

  • Anonymous
    March 03, 2011
    I always have to hit 2 times the post button to post a message to the blog. Seems to be bug.

  • Anonymous
    March 10, 2011
    It would be great to have the equivalent of GCJ for .NET

  • Anonymous
    August 16, 2011
    More about cloud computing: http://dcxcloud.blog.com/