Jaa


Loops are gotos

Here's an interesting question I got the other day:

We are writing code to translate old mainframe business report generation code written in a BASIC-like language to C#. The original language allows "goto" branching from outside of a loop to the interior of a loop, but C# only allows branching the other way, from the interior to the exterior. How can we branch to the inside of a loop in C#?

I can think of a number of ways to do that.

First, don't do it. Write your translator so that it detects such situations and brings them to the attention of a human who can analyze the meaning of the code and figure out what was meant by the author of this bad programming practice. Branching into the interior of a loop is illegal in C# because doing so would skip all of the code that is necessary to ensure the correct behaviour of the loop. Odds are good that this pattern indicates a bug or at least some bad smelling code that should be looked at by an expert.

Second, this pattern is not legal in C# or VB.NET, but perhaps it is legal in another useful modern language. You could write your translator to translate to some other language instead of C#.

Third, if you want to do this automatically and you need to use C#, the trick is to change how you generate your loops. Remember, loops are nothing more than a more pleasant way to write a "goto". Suppose your original code looks something like this pseudo-basic fragment:

10   J = 2
20   IF FOO THEN GOTO 50
30   FOR J = 1 TO 10
40     PRINT J
50     PRINT "---"
60   NEXT
70   PRINT "DONE"

The "obvious" way to translate this program fragment into C# doesn't work:

     int J = 2;
     if (FOO) goto L50;
     for(J = 1 ; J <= 10; J = J + 1)
     {
       PRINT (J);
L50:   PRINT ("---");
     }
     PRINT ("Done");

because there's a branch into a block. But you can eliminate the block by recognizing that a "for" loop is just a sugar for "goto". If you translate the loop as:

     int J = 2;
     if (FOO) goto L50;
     J = 1;
L30: if (!(J <= 10)) goto L70;
       PRINT (j);
L50:   PRINT ("---");
     J = J + 1;
     goto L30;
L70: PRINT ("done");

and now, no problem. There's no block to branch into, so there's no problem with the goto. This code is hard to read so you probably want to detect the situation of branching into a loop and only do the "unsugared" loop when you need to.

It is difficult to do a similar process that allows arbitrary branching in a world with try-finally and other advanced control flow structures, but hopefully you do not need to. Old mainframe languages seldom have complex control flow structures like "try-finally".

Comments

  • Anonymous
    March 10, 2009
    > Branching into the interior of a loop is illegal in C# because doing so would skip all of the code that is necessary to ensure the correct behaviour of the loop. Can you elaborate? Do you mean the loop condition for while-loops, and also the initializer for for-loops (in which case goto silently breaks the semantics of the loop - but this seemingly doesn't apply to do-while), or some other code that the compiler generates undercover? It's more general than loops. In C# it is illegal to branch into the middle of any block from outside the block. With loops, one wants to be able to reason about the invariants of a loop without having to understand how control could have entered the loop. More generally, one should be able to reason about the control flow within a given block solely by looking at that block. You should not have to look at the whole method to understand how a given block in it could be used. Another way to look at this is to think of the name of the label as being scoped to the block, just like the name of a local declared in the block is scoped to the block. Just as you cannot refer to a local by name outside of its scope, you cannot refer to a label by name outside of its scope.   -- Eric

  • Anonymous
    March 10, 2009
    In a previous life, I had to do just such a beast and wound up doing exactly what you suggested.  However I could do some static analysis and only write the IF/GOTO loop if there were a /targeted/ label inside of the original FOR loop.  Otherwise, I'd write the more natural corresponding FOR loop. If I did have to write the IF/GOTO loop I would insert a comment block that indicated what it read originally (FOR) and where the reference to the inner label came from.  This facilitated hand-massaging the code later into something more manageable. Thank goodness there were no COME FROM constructs!  :)

  • Anonymous
    March 10, 2009
    You can't do Duff's Device in C#?! Jeez. Gotos are fun. I wrote a (lexer) state machine once with no loop. It just bounced around with gotos until it bounced into an endpoint. It's weird, but good, that we've reached a point when gotos are an exotic construct that you never think about.

  • Anonymous
    March 10, 2009
    Simple suggestion, unroll the part of the loop that is being jumped to, outside of the loop, and insert that part at the jump location ... This could get a little complicated if the jump traverses many branches of code, but otherwise I think it could be an option ...

  • Anonymous
    March 10, 2009
    When you say this is legal in VB, you mean VB6, right - not VB.Net? Bizarre. The customer who asked me about this issue said that it worked in VB.Net, but I have just checked the documentation and clearly it does not. Thanks for noticing that. -- Eric  

  • Anonymous
    March 10, 2009
    So how would you translate this into oh, FORTH, or Python, for example, neither of which has labels.. You need to figure out what the program REALLY needs to do and do it without spaghetti (even flying spaghetti monster) code.

  • Anonymous
    March 11, 2009
    The comment has been removed

  • Anonymous
    March 11, 2009
    When people realize today's if-else are also gotos, there's gonna be rioting in the streets! ;)

  • Anonymous
    March 11, 2009
    The comment has been removed

  • Anonymous
    March 11, 2009
    Suppose you have something like this: int i = 0; if( j < 5 ) goto inside; for( ; i < 10 ; ++i ) {   do_something( ); inside:  do_more( ); } Then why can't you translate it like this? int i = 0; if( j < 5 ) do_something( ); for( ++i ; i < 10 ; ++i ) {   do_something( ); inside:  do_more( ); } This is exactly what Pop Catalin suggested. It will vary on case by case basis, but it is easier to do and more easier to understand.

  • Anonymous
    March 11, 2009
    The comment has been removed

  • Anonymous
    March 11, 2009
    Read "Structured Programming with go to Statements".  That goto breaks invariant reasoning techniques was established to be wrong.

  • Anonymous
    March 11, 2009
    <Quote> Remember, loops are nothing more than a more pleasant way to write a "goto". </Quote> Your momma's a "goto."

  • Anonymous
    March 12, 2009
    The comment has been removed

  • Anonymous
    March 12, 2009
    My favorite "goto loop" pattern:  do { // you think that's a loop? ha!    ...    if (something) break; // look, no goto!    ...  } while (false); // look, no label either! Yeah, I worked with a guy who really liked that one.

  • Anonymous
    March 12, 2009
    The comment has been removed

  • Anonymous
    March 13, 2009
    Eliminating goto was supposed to be the panacea for spaghetti code, and yet it continues to plague what could otherwise be fabulous code... Curious...

  • Anonymous
    March 13, 2009
    Guys, goto is not by itself a bad thing in any way. Its evilness is that it is very easy to misuse it, but by itself it is a useful tool in any imperative programming languages. Did you never have to break out of a nested loop? Or a switch nested inside a loop? And if you're coding in C (not C++), goto is essentially the only robust way to do proper error handling, when you have to check for error (and potentially clean up) after every single function call. The extreme argument against goto ("Goto is always evil. Always!") is also the argument against break and return, and, to some extent, throw. If you are truly willing to go to these lengths, then you should go all the way, and claim that nothing less than the purity of Haskell is desirable. Which would at least be a consistent position, even though not a very pragmatic one. But, so long as you stick to an imperative language, some form of goto is necessary and desirable.

  • Anonymous
    March 13, 2009
    By the way you can do a Duff's device in c# static void DuffsDevice<T>(T[] from, T[] to) { int count = from.Length; int position = 0; switch (count % 8) { case 0: to[position] = from[position++]; goto case 7; case 7: to[position] = from[position++]; goto case 6; case 6: to[position] = from[position++]; goto case 5; case 5: to[position] = from[position++]; goto case 4; case 4: to[position] = from[position++]; goto case 3; case 3: to[position] = from[position++]; goto case 2; case 2: to[position] = from[position++]; goto case 1; case 1: to[position] = from[position++]; if ((count -= 8) > 0) goto case 0; else break; }

  • Anonymous
    March 16, 2009
    @pminaev: An example of combined error-handling in C without using goto: bool got_error = FALSE; got_error = do_func_1(); got_error = got_error && do_func_2(); got_error = got_error && do_func_3(); got_error = got_error && do_func_4(); if (got_error) {  // combined error handling }

  • Anonymous
    March 16, 2009
    Oops, wrong logic (it's early) but you get the idea.

  • Anonymous
    March 16, 2009
    That's assuming your functions return bool to indicate error - what about an int error code (note that in that case, && cannot be used, as it will coerce any non-0 "truth" value to 1)? In practice, most C code I've seen uses "if ... goto error" and so on. Probably also because it's far clearer than this trick, too. As I've said earlier, sometimes goto is the right way to solve the problem, and working around it only makes things more complicated, all for the sake of "not saying the dreaded word".

  • Anonymous
    March 17, 2009
    The comment has been removed

  • Anonymous
    March 17, 2009
    The comment has been removed

  • Anonymous
    March 17, 2009
    > Either way, the only advantage to the goto approach is that it saves a couple of clock cycles. You still miss my point. The advantage of the goto error handling approach is that it is usually more readable than any other workaround. Workarounds for the sake of them are not a good idea.

  • Anonymous
    March 17, 2009
    I always used to think that the GOTO statement was a bad thing, and programming should only use loops and other "prettier" flow techniques. Over the last few weeks I was playing around writing an MSIL disassembler to look deeper into the code I write... and the first thing I noticed was that compiler translate loops into a seried of GOTO blocks. Typically a for loop would look something equivalent to:              (Initialise looping variable)              goto LABEL1; LABEL2: (Whatever instructions are inside the block)              (Increment or decrement looping variable as required) LABEL1: if (Check for loop to be run) goto LABEL 2; So a trivial loop: for (int ix = 0; ix < 10; ix++) {  do_something(); } do_loop_finished(); gets translated to:              int ix = 0;              goto LABEL1; LABEL2: do_something();              ix++; LABEL1: if (ix < 10) goto LABEL 2;              do_loop_finished(); So I am now wondering why we are always told to you loops rather than gotos, when they are just converted back to gotos, thus adding an additional level of translation (and hence sub-optimal code)!

  • Anonymous
    March 17, 2009
    The comment has been removed

  • Anonymous
    March 18, 2009
    The comment has been removed

  • Anonymous
    March 18, 2009
    It's probably also worth pointing out that in a modern language, like C#, most error handling will be done by exceptions - which in many ways offer all the same benefits as gotos in this scenario, with the added advantage of carrying information about why they were raised.

  • Anonymous
    March 18, 2009
    Of course ALL flow control becomes "goto" (possibly conditional) at the machine level. Anyone who does not understand that, needs to just hang up their had. Much more dangerous than "goto" is the infamous "come from" instruction. When using this, you can cause an arbitrary transfer from any point in the code: 10 x = 1 20 y = 1 30 x = 2 40 Print x,y,z 5000 Come From 20 5001 y = 99 5002 GoBack (Sorry for posting this 14 days before it's 35th original publication)

  • Anonymous
    March 18, 2009
    So you cant jump to a goto label within a loop from outside the loop? I didnt know this. But whats that piece of code that Reflector happens to show when decompiling a yield return statemachine?    private bool MoveNext()        {            try            {                switch (this.<>1__state)                {                    case 0:                        this.<>1__state = -1;                        this.<>7__wrap4 = this.<>4__this.GetNodes().GetEnumerator();                        this.<>1__state = 1;                        while (this.<>7__wrap4.MoveNext())                        {                            this.<node>5__3 = this.<>7__wrap4.Current;                            this.<>4__this.m_alreadyReturnedNodes.Add(this.<node>5__3);                            this.<>2__current = this.<node>5__3;                            this.<>1__state = 2;                            return true;                        Label_008C:                            this.<>1__state = 1;                        }                        this.<>m__Finally5();                        break;                    case 2:                        goto Label_008C;                }                return false;            }            fault            {                this.System.IDisposable.Dispose();            }        } Is it just Reflector making up a while statement while the original il statements just are gotos? Regards Florian

  • Anonymous
    March 18, 2009
    The discussion is moving a litle away from the first remark about automated migrated code. Yes excessive and wrong use of Goto is very bad. But looking manual at 10.000.000 lines to remove them is a huge task. Analyzing and refactoring code solves a number of them but still not all. The IL itself does allow it so why not the compiler. Let the programmer be the judge to use it or not. In vb.net indead the goto into a for is not allowed but a goto into an if is. And possible other code levels?        Dim j As Integer        j = 1        GoTo l50    ' <---- allowed        If (1 = 1) Then            j = 2 l50:            j = 3        End If        GoTo l60  '<----------- not allowed        For j = 1 To 5 l60:        Next

  • Anonymous
    March 20, 2009
    The comment has been removed

  • Anonymous
    March 20, 2009
    > The IL itself does allow it so why not the compiler. Let the programmer be the judge to use it or not. Because the compiler must enforce a semantic level of meaning on the code so that it can effectively generate IL code that does what you wrote in the higher level language. Consider the difference between for loops and if-else statements. There is quite a bit of difference in the amount of setup required to make each work. if-else: No setup required, beyond performing the test condition and branching to the else part. Since there's no setup, it's no problem to goto into either the if-body or else-body. for loop: At minimum, initializing the loop index and check that it meets the test condition. (foreach is even worse... find the appropriate enumerator, get that all setup, etc., etc...) With all this setup, how do you reasonably do a goto into the body of a for loop AND get provably correct code for EVERY case? Not too likely. (On the other hand, you can goto out of a for loop, because that amounts to little more than a break.)

  • Anonymous
    March 20, 2009
    I meant flowchart from Visio to be precise.

  • Anonymous
    March 20, 2009
    The comment has been removed

  • Anonymous
    March 20, 2009
    What would people say at goto's funeral? goto's wife: "Oh my sweet goto, I can't loop without you" goto's kids: "we want our goto back" goto's 3rd cousins' brother in-laws' 5th pets' owner: "He was pure evil, evil I tell you!"

  • Anonymous
    March 21, 2009
    I would suggest converting the code to MSIL, then decompiling the MSIL into C#. That way you get C# code, but it will still have loops where ever possible.

  • Anonymous
    March 22, 2009
    The comment has been removed

  • Anonymous
    March 24, 2009
    The comment has been removed

  • Anonymous
    March 25, 2009
    The comment has been removed

  • Anonymous
    March 26, 2009
    The comment has been removed

  • Anonymous
    March 27, 2009
        for(J = FOO ? 3 : 1; J <= 10; J++)       PRINT (string.Format("{0}rn---",J));     PRINT ("Done");

  • Anonymous
    March 28, 2009
    The comment has been removed

  • Anonymous
    March 28, 2009
    The comment has been removed

  • Anonymous
    March 30, 2009
    The comment has been removed

  • Anonymous
    March 31, 2009
    if (StatusOK) {   if (DataAvailable) {      ImportantVariable = x;      goto MID_LOOP;   } } else {   ImportantVariable = GetSomeValue();   MID_LOOP:   //Lots of code } I'm disappointed that no-one has mentioned Steve McConnell yet.

  • Anonymous
    April 01, 2009
    The comment has been removed

  • Anonymous
    April 06, 2009
    Seriously, we have methods now...... If you need to jump to something inside a loop, then it should probably be in a separate method.

  • Anonymous
    April 06, 2009
    With regard to Duff's device (see comments from A.C.Hynes and Matthew Whited above) here is another C# version which is closer to the original.   C# is surprisingly expressive at times: namespace DuffsDevice {    using System;    class Program    {        static void Main(string[] args)        {            unsafe            {                short[] data = new short[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};                short register = -1;                fixed (short* dataPtr = &data[0])                {                    DuffsDevice(&register, dataPtr, data.Length);                }                Console.WriteLine(register);            }        }        /// <summary>        /// C# approximation to Duff's device.        ///        /// void send(register short *to, register short *from, register int count)        /// {        ///     register int n=(count+7)/8;        ///     switch(count%8)        ///     {        ///         case 0: do{ *to = *from++;        ///         case 7:     *to = *from++;        ///         case 6:     *to = *from++;        ///         case 5:     *to = *from++;        ///         case 4:     *to = *from++;        ///         case 3:     *to = *from++;        ///         case 2:     *to = *from++;        ///         case 1:     to = from++;        ///         }while(--n>0);        ///     }        /// }        /// </summary>        /// <param name="to"></param>        /// <param name="from"></param>        unsafe static void DuffsDevice(short to, short from, int count)        {            int n = (count + 7) / 8;            switch (count % 8)            {                case 0:                    *to = *from++;                    goto case 7;                case 7:                    *to = *from++;                    goto case 6;                case 6:                    *to = *from++;                    goto case 5;                case 5:                    *to = *from++;                    goto case 4;                case 4:                    *to = *from++;                    goto case 3;                case 3:                    *to = *from++;                    goto case 2;                case 2:                    *to = *from++;                    goto case 1;                case 1:                    *to = *from++;                    if (--n > 0) goto case 0; else break;            }        }    } }

  • Anonymous
    April 07, 2009
    re: Duff's Device... Hand-rolled optimization in C#? You guys are funny.

  • Anonymous
    April 07, 2009
    Turns out that a simple loop to iterate through a safe array can be a bit faster than my version of Duff's Device in .NET.   A loop to iterate through allocated stack space (using stackalloc) using an unsafe pointer is much faster.   So, 'Duff's device' in C# is sub-optimal.   A case of hand-rolled de-optimization!   Full marks to C# for being so expressive, but don't use the technique!

  • Anonymous
    April 15, 2009
    this is funny stuff. We are being guided to the old, frustrating, spider webs of looping with GOTOs. this shouldnt practically be an approach in professional development using C#.

  • Anonymous
    April 16, 2009
    One time long ago I inherited a program written in C-tran.  The author knew and thought in FORTRAN but had to code the app in C, and a veritable mess of code resulted that contained over 400 GO TO stetements.   We stumbled along maintiaing it, never given the time to rewrite what was "working" but always allowed the time to analyze the mess of spaghetti for unintended consequences of the changes we were planning to make (and deal with the ones we did not foresee). We recruited a bright guy from the customer support team to come into the product development team and his soel condition for accepting the position was they he be allowed to rewrite that app.  It was accepted and the final product after3 months was easier to understand, easier to maintain and easier to extend. GOTO is just a tool - but it is VERY dangerous in the hands of those who do not fear it.

  • Anonymous
    April 17, 2009
    Bad code is bad code. One needs to understand what branching into a FOR statement in their compiler means. I have used languages where there is only the goto.  I have also see bad FOR statements. 10 FOR i = 1 TO 10 20     PRINT i 30 NEXT i 40 LET i = i * 2 50 PRINT i What is the value of i? 20 or 22.  This depends on the BASIC version. Of course we should not use i outside to loop.

  • Anonymous
    April 29, 2009
    The comment has been removed

  • Anonymous
    April 30, 2009
    How much knowlege is available about how the original program works, or (more importantly) what it is supposed to be doing? A lot of old code can be reduced (in lines of codes) by 20 to 80%, purely on the basis that much of it was cut/pasted from other code that did the same thing on a different instance of data.  This is why classes/objects/interfaces/etc came about to begin with. I agree it is painful to take in hand thousands of lines of code and re-design and re-write it, but it many cases it is well worth it, and would take a lot less time than one might think (there is a certain moment of inertia that begins to build once you get familiar with a coding style no matter how ugly it might seem).

  • Anonymous
    May 06, 2009
    The comment has been removed