다음을 통해 공유


Isolation in Maestro

Disclaimer: The Maestro project has been renamed. While referred to in this article as ‘Maestro’, moving forward the project is referred to with codename ‘Axum’.

Copied from 'Concurrently Speaking'  

As noted in the Dr Dobb's article, Maestro is primarily about establishing isolation domains so that we can cut down on the number of undocumented dependencies between components. With a language like C# or VB, any two references of type T could be referring to the same object, and if you consider a whole object graph, you have to keep track of all the references within the graph. In C++, you don't quite know where your pointers have been, or what type they started out as, so the problem is even worse there.

Sometimes, keeping track of references in your own ad hoc way is easy to do, for example when you have a very small program that doesn't call into libraries you don't know much about and you're the only developer working on it. Or, maybe you have really spent a lot of time on data design and carefully architected your application to avoid concurrency issues. If so, what happens when you start on the next version of the software, suddenly under customer pressure to quickly provide ever-increasing value?

It is more or less against the nature of object-oriented languages to restrict access to objects in the ways that would make programming parallel programs easier and safer, so we need to look elsewhere for inspiration.

There are several places to look -- functional languages, for example, offer a great solution by not allowing side-effects. Without side-effects, there's no reader/writer competition and data races cease to exist as a concern. Of course, most interesting computer activities are all about side-effects, so we need to escape the model from time to time. That doesn't diminish the value of the functional approach to programming -- you have significantly restricted the number of areas in your application that you have to manage yourself, which is very valuable in itself unless you are a theorist for whom only a completely pure model is acceptable.

Inspiration from the Web

This need to escape the model is not what causes us to look elsewhere, it is the fact that all the mainstream platforms are unsuitable for functional programming, as they have been designed with imperative languages in mind. With Maestro, we instead looked at the web for inspiration -- it also offers an isolation model, based on separating address spaces. Simply, if a pointer isn't valid in your address space, and you cannot send it to another, you don't have to worry as much about aliasing.

Of course, separated address spaces has a very high overhead, so we're trying to use the model rather than the implementation, letting a compiler enforce the constraints rather than the OS (compilers are particularly good at that).

Domains

In Maestro, the key isolation concept is a domain, which limits the runtime scope of data to its compile-time scope. In other words, objects that are created within a particular domain don't escape it. The only thing that may escape a domain is copies of its data or instances of immutable types (which .NET doesn't have a lot of, but String is an example).

A domain looks like this:

domain D1
{

    object obj = new object();
string str = "Hello!";
}

You cannot call a method on a domain from outside it -- all its methods are either private or protected; the only thing you can do from the outside is create the domain:

var d = new D1();

Agents

So how do you manipulate the state? After all, data we cannot reach is just a waste of memory. We give you access to domain data via agents, which run on a thread that is different from the "caller." Agents are active components, while domains are inactive. This means that agents may have their own control-flow and act independent of the client that created it.

Agents also cannot have their methods called from outside the body of the agent. In fact, agent instances are not created using a constructor, nor do we ever have the opportunity to hold a reference to an agent (thus, reflection-based invocations are harder). Instead, when we create an agent instance, the Maestro runtime established a communication channel for us to use when talking to the agent. This is called the agent's primary channel, which is explicitly typed in the agent declaration:

domain D1
{

    object obj = new object();
string str = "Hello!";

    agent A1 : channel C1
{
A1()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
PrimaryChannel::Result <-- 10;
}
}

}

As you can see, this agent has a channel type called 'C1' and starts its work by receiving a message from its primary channel. Receive is a built-in function of Maestro and is one of three ways to receive messages coming from outside a domain. The agent ends by sending the value '10' as the result of its work.

As declared, A1 instances only have access to immutable domain state. While the string instance is immutable, the reference 'str' itself is not, so A1 does not have access to anything in D1. Because they don't, A1 instances can safely run in parallel with all other agent instances inside our outside the domain.

We can give it access to domain state by adding a keyword to the agent declaration:

domain D1
{

    object obj = new object();
string str = "Hello!";

    agent A1 : channel C1
{
A1()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
PrimaryChannel::Result <-- 10;
}
    }

    reader agent A2 : channel C1
{
A2()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
String myStr = str; // can read
str = “new string” // error: but cannot write
PrimaryChannel::Result <-- 10;
}
}
}

Unlike instances of A1, instances of A2 may read the domain fields and the instances they refer to and may use them in their work. They may, however, not modify either the fields or the instances they refer to. To do so, the agent has to be declared a 'writer':

domain D1
{

    object obj = new object();
string str = "Hello!";

    agent A1 : channel C1
{
A1()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
PrimaryChannel::Result <-- 10;
}
}

    reader agent A2 : channel C1
{
A2()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
String myStr = str; // can read
str = “new string” // error: but cannot write
PrimaryChannel::Result <-- 10;
}
}

    writer agent A3 : channel C1
{
A3()
{
var startWith = receive(PrimaryChannel::FirstMessage);
...
String myStr = str; // can read
str = “new string” // and write
PrimaryChannel::Result <-- 10;
}
}
}

Here, A3 may change the values of 'obj' and 'str' or modify the instances they refer to (in this case, both are immutable, but 'obj' could point to something that isn't later on). All instances of A1 can still run without coordination with other agents, but instances of A2 and A3 must coordinate their executions.

The reader / writer attribution is used to do this -- as many A2 instances as are available may run in parallel as long as no A3 instance is running. Only one instance of A3 may be executing code at any given point in time.

How to agents yield to each other, then? They do so by receiving messages. Waiting for a message means giving up your execution rights until the message is available. Thus, all coordination between agents is achieved via message-passing.

The Maestro agents concept is very much related to C++ agents, which I discussed at and after PDC. In managed code, we have a lot more infrastructure at our disposal to enforce constraints. For example, creating a new domain language is much more reasonable for .NET than for Win32.

In this post, I didn't go into detail on how to define the channels that agents use to coordinate their work, nor how Maestro interacts with the rest of .NET in a safe manner. There are a couple of other concepts that also need explanation, such as message-passing, data-flow, failure models, protocols and payload schema, but they will have to wait until another time.

 

Niklas Gustafsson

Comments

  • Anonymous
    February 27, 2009
    Have you measured the performance difference between Maestro and .Net interprocess communication?  Are there advantages of Maestro other than performance?

  • Anonymous
    February 28, 2009
    Maestro is not intended as a replacement or alternative for .NET remoting or WCF, or any other distribution technology. It is a language promoting a programming model that takes its core pattern from web programming, but aims to scale it from in-process to widely distributed. .NET remoting, WCF, MPI, and other distribution technologies are / will be leveraged for programs that are distributed, Maestro itself does not contain the technology stack for it. Right now, we only have a binding to WCF, so we haven't run Maestro on .NET remoting yet. Thus, Maestro will do as well or poorly as the underlying communication technology in use. For in-process communication, it will be a lot faster than .NET remoting, but it isn't an apples-to-apples comparison.

  • Anonymous
    March 01, 2009
    I'm working on a proposal targeting the same field. see <http://docs.google.com/Doc?id=dcnk38d7_11hbkdjvhs> It uses .NET attributes to model the threading aspects of an application. What do you think about it ? cheers / Stefan

  • Anonymous
    March 02, 2009
    I understand that Maestro is not intended as a replacement for .Net remoting, etc.  However, it seems you could use a subset of .Net remoting as a replacement for Maestro and enjoy all the same benefits: total protection between running threads (processes), easy instantiation and method calling, so you don't have to manage connections, etc. Actually, I'm not as familiar with .Net as COM.  When faced with this problem with the past, I have created exe COM servers, that call CoRegisterClassObject(REGCLS_SINGLEUSE).  And I get all the benefits described in this blog entry, plus I don't have to manage channels. So perhaps Maestro performs better than this COM approach because it only marshals between threads instead of between processes.  Aside from that, what are the advantages of Maestro?

  • Anonymous
    March 31, 2009
    Disclaimer:&#160; The Maestro project has been renamed.&#160; While referred to in this article as ‘Maestro’,

  • Anonymous
    March 31, 2009
    Disclaimer:&#160; The Maestro project has been renamed.&#160; While referred to in this article as ‘Maestro’,

  • Anonymous
    April 17, 2009
    but can MS not give us a distributed parallel software now, rather than wait for X years for yet another parallel programming paradigm ... a little acquisition like Digipede would do for the moment, then you can add MPI, TPL, PLINQ, CCR or whatever new research to the runtime and .NET languages later ... meanwhile we developers can build larger and more scalable software (which means more licensing $$ to you) without waiting for X years for MS to catch on tools, runtimes and languages and without having to train on these too many new concepts

  • Anonymous
    April 20, 2009
    Regarding C++ guy's comment: It goes beyond just scheduling between threads instead of processes -- Axum (our new name) uses much lighter-weight threads than the system in general. I've had 500,000 simultaneous (blocked) agents running on my laptop without problems and that is a scale that you won't get with threads or processes. It will allow the pattern to be applied to much finer-grained work, thus extending the safely isolated model to a new category of algorithms.

  • Anonymous
    May 06, 2009
    The subject of immutability sparks intense interest among the people who follow our blog, as is evident

  • Anonymous
    June 27, 2009
    The comment has been removed

  • Anonymous
    November 19, 2009
    The comment has been removed