What are the goals of refactorings?

An interesting issue came up today that many people on the C# team had a passionate discussion about. It was about the following code:

 
class C<T> {
    void M(T t) {
         t.ToString();
         Console.WriteLine(t.ToString());
    }
}

Where the user was extracting the first "t.ToString()" line. In that case we currently produce the following:

 
class C<T> {
    void M(T t) {
         NewMethod(t);
         Console.WriteLine(t.ToString());
    }
    void NewMethod(T t) {
         t.ToString();
    }
}

What's wrong with that? It seems completely reasonable. But consider the following code:

 
struct WeirdStruct {
     int i;
     public override string ToString() {
          i++;
          return i.ToString();
     }
}

...

C<WeirdStruct> c = new C<WeirdStruct>();
c.M(new WeirdStruct());

Before the refactoring we will produce the output "2". however after the refactoring we'll produce "1". Why? Because of a fun property of structs and how they are passed by value. After the refactoring we will actually pass a copy of the struct to the new method and the call to ToString will affect that copy only.

Because we don't know if the type variable will be instantiated with reference type or a value we don't know if it will ever be safe to just pass the type variable. The only way to make it safe would be to produce the following code:

 
    void M(T t) {
         NewMethod(ref t);
         Console.WriteLine(t.ToString());
    }
    void NewMethod(ref T t) {
         t.ToString();
    }

Notice that we now pass the type variables around as ref parameters. (In the reference case this won't affect anything, and in the struct case it will solve the above problem. However, some of us were worried about what this would be like for users using 'extract method'. They'd extract code that used type variables and would get pretty confused. Chances are that they would have written the code that we currently produce and if we generated the new code they wouldn't understand why and they'd remove the 'ref' from the parameters.

Several solutions were discussed, namely:

  1. Keep the current implementation, potentially producing code that might not produce the same results as the original
  2. Make all type passed as ref parameters, potentially producing confusion and dissuading people from using the feature
  3. Offer both behaviors through an option on the message box that we pop up to ask the user for the new method name

I bring this up because it goes deep to the heart of what you think a refactoring is. Is the purpose to help you modify your code safe in the knowledge that the meaning will stay the same? Is the purpose to help you make a large change in your code in the way you yourself would have done it, but to do it much faster than it would take you? Should you have enough unit tests so that issues like this would be caught and it's ok to possibly change the meaning of the code? Would you rather have to fix up our refactorings manually after we performed them, or would you prefer to try to track down a bug that was caused by the refactoring? Does it depend on the refactoring. i.e. are you willing to have some refactorings behave one way and others behave the other way? Is there a point at which you pick between one or the other or is it always clear cut for you ? What do you think the best solution is? I'd love to hear your thoughts on this.

Comments

  • Anonymous
    June 02, 2004
    The comment has been removed
  • Anonymous
    June 02, 2004
    Fredrik: Out of curiosity what is the benefit of passing the value types and strings by ref?

    In our current model we don't ever pass reference types by ref, and we only pass value types by ref if we modify them.
  • Anonymous
    June 02, 2004
    My answer to the question "What are the goals of refactorings?" is:

    "Write code that humans can understand" it will make it much easier to track bugs and fix bugs. Refactoring will also help developers to easy track bugs and fix bugs.

    Sorry for double posting..
  • Anonymous
    June 02, 2004
    Cyrus:

    >Out of curiosity what is the benefit of passing the value types and strings by ref?

    I don't usually do that, but for example if you have a ArgumentHelper class that would check if a string is within a length and not empty or null, it would increase the performance if the string was passed as with ref (Or I'm wrong?). For the moment I don't have a good comment about why passing value-types as ref, but I think the option should exist.
  • Anonymous
    June 02, 2004
    Fredrik: Interesting. Do you think this would be an easy bug to find/track? In a large application it might go unnoticed. As the above code showed it just ends up with a one-off change. In an app this might not crash but might just cause slightly incorrect results that could be extremely difficult to track down.

    It's also not clear to me that in this case the understandable code makes it easier to find bugs. Because the code is so 'understandable' it could lead to people just glancing over it and not even considering that that could be the place where the problem is.
  • Anonymous
    June 02, 2004
    Cyrus:

    >Do you think this would be an easy bug to find/track?

    If you refer to your example in the post, I will say both yes or no. It could take some time, but I it would probably harder to find it without doing any refactoring. To find this kind of bug I should start with the method that gives me the wrong result, from that method I will go down a check each method code that will be called. If refactoring was used, the code will probably have fewer lines of code in each method, and in that way it would be easier to navigate through the code and understand what the code are doing. As I mention before, refatoring will make the code easier to understand and make it easier for other developers to read, so it could and with my opinion it will make it easier to find the code that results in the bug.
  • Anonymous
    June 02, 2004
    There are many goals of refactoring. In my opinion, the primary goal of refactoring is to create code that will be easier and cheaper to maintain and extend in the short and long run.

    By definition (if you take Martin Fowler's definition) a refactoring should NOT change the behavior of code. However, that leads into the goal of the Unit Test, to make sure that the behavior doesn't change. If you refactor code without sufficient unit tests, you don't have a safety net to give you a fair degree of confidence that the behavior has not changed.
  • Anonymous
    June 02, 2004
    Don't use ref for this purpose. The WierdStruct is probably not even in the 5% usage case. If the developer is aware of the value type ref issue they can fix it them selves, otherwise they might have a bug to fix.

    You could use ref when the variable type is known type at compile time.
  • Anonymous
    June 02, 2004
    The comment has been removed
  • Anonymous
    June 02, 2004
    Besides, weren't you the one who said mutable value types are harmful? I think a mutating struct should produce a compiler warning at least.
  • Anonymous
    June 02, 2004
    Frederik: Thanks for your feedback. However, what if you have the following situation:

    You do the refactoring. Then weeks go by and eventually you notice the bug. You then spend a lot of time tracking it down. You've just lost a lot of time, had to investigate a bug and then fix a mistake of ours. I think i see why you want option 3 then.

    Do you think simply by having the option it would raise awareness of this issue and would make people think and consider what they were doing when extracting a method?
  • Anonymous
    June 02, 2004
    Haacked: You said: "If you refactor code without sufficient unit tests, you don't have a safety net to give you a fair degree of confidence that the behavior has not changed."

    Should we be helping out people in that situation? Do you this we should allow refactorings that can introduce bugs because we expect that you have enough testing going on to catch it? Or do you think we should produce code that keeps the semantic meaning the same?

    For all responders: Thanks for the feedback, it's been excellent!
  • Anonymous
    June 02, 2004
    For you who reading this and are not familiar with refactoring and have never heard of it, then I should recommend you to read about it, it will make you write better code :)

    Here is a link to Martin Fowler’s book about Refactoring.

    http://martinfowler.com/books.html#refactoring

    I think the option of adding ref for value-types and string should be in the dialog of the "Extract method". If it exist it could prevent developers to forget to add ref if it's needed. Here is an exmaple where ref could be useful:

    if( !string.IsNullOrEmpty(name) && name.Length > 256 )
    throw new ArgumentException();

    With refactorin thie could be:

    ArgumentHelper.CheckStringArgument(ref name, 256)

    private static void CheckStringArgument( ref string value, int size )
    {
    if( !string.IsNullOrEmpty(name) && name.Length > size )
    throw new ArgumentException();
    }

    Now the CheckStringArgument could be reused in other methods and classes, and by passing the string as ref it could or will increase performance.

  • Anonymous
    June 02, 2004
    Haacked: Yes, i think this is a great example of the dangers of mutable structs :-)

    That came up in our discussions as well. However, this goes into what you think about refactorings.

    If you believe that we should not change the meaning of your code then we should maintain that property even if you are writing 'questionable code' :-)

    I mentioned to Mike that I saw this as similar to the issue of Trustworthy computing. Do we make the choice for you thinking that we know best? Or do we make the safe choice and in this case ask you to make the decision of if you want to override the safe choice. This goes back into the option on the dialog that pops up.

    That way we start out in a safe default state. If you decide that that isn't for you and you'll be coding in a way where you can just pass the type variables around, then you override the choice once and we'll remember that change.

    The option is discoverable because it will be on the dialog that pops up.

    If, then, in the future you evver get a bug, you're only in that state because you overrode the safe behavior that we defaulted with.
  • Anonymous
    June 02, 2004
    Fredrik: Thanks for the links! The passing strings by ref in this case wouldn't have a perf impact on you. The equivalent code:

    private static void CheckStringArgument( string value, int size )
    {
    if( !string.IsNullOrEmpty(name) && name.Length > size )
    throw new ArgumentException();
    }

    would have the same meaning as the "ref string" case with the same performance.
  • Anonymous
    June 02, 2004
    Haaacked: I agree. It should be a warning :-)

    Adam: I like your thoughts on the 5% case. It seems like it could be a reasonable bar on how to balance between user expectations and correctness.
  • Anonymous
    June 02, 2004
    Cyrus:

    Thanks for letting me know.

    What option do you choose? There must be one option that you like more than the others? What are your thoughts about the option you will choose? I’m interesting of hearing your thoughts now when you have heard ours :)

  • Anonymous
    June 02, 2004
    Cyrus...

    In the .Net framework class library you can find this method in the BaseHandler class.
    public static XmlNode GetAndRemoveRequiredNonEmptyStringAttribut(XmlNode xmlNode,string attrib,ref string value)
    {...}

    I found this design very usefull and uses this pattern for other methods in my own framework.
    Like my StringUtil class method "public static void GetNonEmptyStringValue(string value,ref string setValue)"

    All this after reading Martin Fowlers Book. I used the Extract Method and change Value to refference refactoring. What are your thoughts about that?

    Best Regards Johan
  • Anonymous
    June 02, 2004
    Hi Cyrus

    Excellent refactoring issue. For me I think it's pretty clear that programmers do refactorings. When the IDE has the facility, it's simply to free the programmer from having to type the code himself. So if there's an option such as whether to pass by ref, then the IDE should ask since the programmer would have to make that call when doing it himself.

    So my answer to all these issues is: what would the programmer have to decide if typing the code himself the old way? The IDE has to ask the same questions if it cannot make the decisions itself.

    Finally, I'd say that many programmers would make the same error you show in your example when refactoring by hand. Having the IDE ask will hopefully illuminate this difference in behaviour of value types (you'd also need a good help link on the dialog that explains this), and thus prevent the mistake that might be made when doing it by hand. Thus, if done well, IDE assisted refactorings can be less buggy than doing it by hand AND educate the programmer at the same time.
  • Anonymous
    June 02, 2004
    Cyrus/Fredrik:

    regarding the passing of strings by ref: Cyrus, you say it would make no performance difference but (without testing so hopefull Rico won't read this :o) ) I suspect that it would actually result in worse performance since passing by ref adds an additional indirection when accessing the value inside the method.
  • Anonymous
    June 02, 2004
    Mike:

    I created a performance test on the ref and the string issue, and if I use ref or not, gave me the same results, so the there is no performance differences.
  • Anonymous
    June 02, 2004
    My "solution":

    When starting the extract method, have the IDE compiler go through all references and determine if it's safe to leave out the ref keyword. If there is uncertainty how the type is used, just add the ref keyword.

    After doing the extract method, if ref keywords where added, bring up a smart tag next to the new ref keyword with exclamation mark, clicking it would give short summary why the ref keyword was added and a link to a help page with good reasoning of what is happening and why. The smart tag could also offer to remove the ref keywords added to the uncertain cases.
  • Anonymous
    June 02, 2004
    Argh, it seems the ide compiler can never be sure of that.. Stupid me? :) Well I still think the smart tag idea is worth considering?
  • Anonymous
    June 02, 2004
    I think that Jokus samrt tag idea is great.

    But I also totally agree with Adam about how rare it is that you would want the ref-behavior. I work a lot as a mentor and there is to many developers using ref because they think that it will improve performance when it doesn't.

    So my conclusion is that I don't want the parameter send as ref. I belive that the dialogs should be as small and easy as possible for refactoring (= not to many choices). But since this could be an issue in some situations the smart tag offering me to make the parameter send by ref would be the perfect middle way.
  • Anonymous
    June 02, 2004
    The comment has been removed
  • Anonymous
    June 03, 2004
    I think that the Refactoring feature should make the work of a programmer easier by automatizing tasks, but without introducing potential problems, AND without introducing potential confusion.

    If this is not doable in some situations, I believe it's easier to hide this type of Refactoring from the menu in those situations, or, don't hide it, but when used, somehow explain what is the situation about and what is the chosen solution. (If the latter, I think that not introducing problems is more important than not introducing confusion, so I think it's better to add ref instead of not adding it.)

    Sorin Dolha [MCAD, MCSD .NET]
  • Anonymous
    June 03, 2004
    Wow. Thanks for the feedback!

    Couple of quick points:

    You always get a dialog currently. That's so you can provide the name for the method you're extracting. The option could go there and could additionally have a link next to it the docs on the issue with a nice bit of explaining text.

    Based on everyones views it seems like refactoring is a balance between keeping things semantically correct and doing what the programmer would have done itself.

    I think that In cases where you dont' think you can do both an option is not unreasonable. I would defaulit to the safe behavior and allow one to override it (and have that preference remembered)
  • Anonymous
    June 03, 2004
    Mike: I like the idea of the IDE taking away a lot of the work involved and potentially educating users as to dangers they should be aware of.
  • Anonymous
    June 03, 2004
    Johan: Having one out/ref parameter seems somewhat smelly to me. They seem to overrride the built in method we have for getting a value out of a method, namely the return value. what about:

    public static string ConvertToNonNull(string val) {
    if (object.ReferenceEquals(null, val)) {
    return string.Empty;
    } else {
    return val;
    }
    }

    then you'd do the following:

    argument = StringUtilities.ConvertToNonNull(argument);
  • Anonymous
    June 04, 2004
    Hi Cyrus,
    About the question in this topic, I would like a popup when you do the Excract
    Method with an option for ref. or is it possible to add another refatoring, Pass Value by
    reference? For me refactoring in not just a way to help myself to understand
    my own code, I will help others in my team to understand it as well. And for
    me it's very important with good method names, good signatures, all this make my code more effective for bugfix and minimize the risk of bugs.
    After reading Martin Fowlers Refactoring book I saw programming in a different
    view. For the first time I like my code design and other can now read my code faster and off course it’s very easy to fix bugs or find them earlier thanks to refactoring. I would like a popup that let me choose ref, out or val (as default) when doing the extract method. I found it more helpful and I think it will make other developers (new at the area) to think an extra time before refactoring.
    I don't see this popup as a time consuming thing. I see it as a help to think
    (which is important.) and a help to make the refactoring even faster and more safe from bugs.
  • Anonymous
    June 07, 2004
    Refactoring suppose to make my life easier and make my code easier to read, understand and debug. If because of refactoring a subtle bug is added and I have never was warned about possibility of buggy outcome it makes my trust into refactoring tool to go down significantly. Who knows what other erroneous refactoring outcomes were judged not important enough to warrant a warning?
    On the other hand if you add ref all the time it does not make my code easier to read not to say that it breaks the beauty of thought expression in the code. Not good :).
    On the constructive note I would say that idea about smart tags is the best. But if it not possible, the 'ref' option should be there, possibly with a small button next to it saying ‘Explain’ so people can control the destiny of their code as well as to learn about why it is important to have mutable structs.
  • Anonymous
    June 07, 2004
    To All: Your comments have been incredibly useful in our group for deciding what to do with this issue. I noticed this comment on the larkware site: "Cyrus points out a situation in which refactoring in C# might break code, and asks for opinions. I suspect this is one of those cases where everyone will know what the obviously correct answer is - and then be astounded to discover that not everyone agrees with them."

    It's been very enlightening for all of us to see how opinions differ on this matter. Hopefully, we'll arrive at a solution that everyone likes.
  • Anonymous
    June 10, 2004
    How about just sticking something like this in the dialog? "Warning: Extracting a method with value type parameters can have unintended side effects. For more infromation read this help file."
  • Anonymous
    June 11, 2004
    Cyrus: I just read this blog on immutable structs and I think it's relevant to this discussion:

    http://blogs.msdn.com/jaybaz_ms/archive/2004/06/10/152748.aspx

    Because of their pass-by-value semantics, making structs immutable avoids a lot of pain.
  • Anonymous
    June 12, 2004
    Don: Are you saying that we should just change the meaning of your code, but warn you about it?

    Mike: Yup. I agree. However, it's not our place to stop you from coding how you want to. Should we break your code just because you code in a style that we don't like?
  • Anonymous
    June 14, 2004
    Cyrus: no, don't break code, but maybe default to unchecking the proposed "pass by reference" checkbox.

    It's a difficult issue, because were the programmer to refactor that code by hand, many would probably forget to pass by ref and they'd get the bug anyway. The problem isn't so much the refactoring, as having mutable structs. There are all sorts of related problems that having such structs would produce.

    It would be a pity to inflict pass-by-ref semantics on programmers who do understand this and have implemented structs in a safe manner.