The Consequences of Performance Optimizations

I wanted to write a post today about a very interesting bug we just came across related to Refactoring.  As i just blogged
about verification, i thought that this would be a good followup. 
First, a bit of background.  Referencing the previous post, we can
see the following Q/A about refactoring:

Why is it important? Well, if we look at Martin Fowler's Refactoring Site, we'll find the following information:

Q: What is Refactoring?

A: Refactoring is a disciplined technique for restructuring an existing
body of code, altering its internal structure without changing its
external behavior.

Now, let's dissamble that "answer".  What does "without chaning
its external behavior" mean?  Well, to me, it means that the
semantics of the code (as defined by the language) have not been
changed.  Why do i limit myself to just "as defined by the
language"?  Because, frankly, any other definition would be too
hard to implement :)  For example, if you perform an "extract
method" of some code that sits in a tight loop, and that the jit
doesn't inline, then you'll have changed the external behavior with
respect to its performance.  However, there's no good way to
detect this, and most people don't care.  Similarly, if your code
makes decisions based on runtime type information (i.e. it uses
reflection), then it's possible to change it's external behavior
because suddenly this alteration of internal structure becomes
visible.  However, customers have spoken and have stated that
they're fine with these caveats and so we accept them. 

Now, here's where it gets a bit interesting.  Let's look at my
original definition: "it means that the semantics of the code (as
defined by the language) have not been changed."  Why is this
interesting?  Well, consider the case of code with errors in
it.  The language defines such code to be semantically
meaningless.  So, at that point, almost any change can be made and
still be called a refactoring since the semantics won't change (i.e.
meaningless code stays meaningless).  In fact, the only change a
refactoring could not do would be to take this broken code and make it
non-broken (since that would change it's semantics from meaningless to
meaningfull).  Early on in VS2005 we considered making it so that
you could only run refactorings if your code was compilable.  We
felt that this went with the spirit of refactoring, and also, it made
the refactoring job orders of magnitude easier since trying to make
sense out of broken code is extraordinarily difficult (my full time job
in fact!).  However, based on a huge amount of customer feedback,
we felt that that was a bad idea.  Customers still wanted to be
able to do things like extract a method even if their code was
broken.  And they were willing to deal with the fact that the
result might not be perfect if the code wasn't in a good shape. 
However, they just wanted to be made aware that this might be the case,
and so we decided to pop up a warning every time you did a refactoring
of unbuildable code.  Specifically:

            

Seems pretty sane right?  We give you the power of a lot of code
manipulation tools at your fingertips, but we warn you that we might
not always be able to do everything correctly.  And because we
have a preview mechanism you can go in and see what we're going to do
to decide if it meets your needs.

So what's the problem?  Well, based on quite a bit of customer
feedback (as well as a nasty bug that i opened myself), we decided to
invest a lot of time on improving the performance of
refactorings.  We felt that they were far too slow, and that that
would limit their usefulness.  At this point i feel that they're
still slower than we'd like, however fast enough to be used as part of
the regular development cycle.  For example, of my fairly meaty
project that i work on, they take on the order of 2-3 seconds. 
I'd like them to be instantaneous, but it's far better than the 2-3
*minutes* that it used to take!  (Note: it's a known problem that
the more projects you have, the more degradation you will get, we'll
work on that more in the future).  The performance optimizations
implemented were several fold.  First off, we realized that we
were just duplicating a whole lot of work, and that it was trivial to
just make sure we were only doing the work once instead.  However,
we also added what we thought were some pretty snazzy heuristics to
improve performance.  For example, if you're renaming a variable,
then we will not compile any methods in your source code that don't
contain the identifier you're renaming in them.  (The C# compiler
does two passes, the type system outside of methods, and the
compilations of individual methods one by one).  This latter
optimization was a huge savings in performance since it meant that, for
the most part, 99% of the methods in your code did not need to be
compiled in order to do a rename. 

But, in our haste to make fast, we missed a critical issue.  If we
don't compile those methods, then we won't know if there are errors in
them.  And if we don't know if there are errors in them, then we
won't give you that dialog.  And now the user can get enormously
confused.  Why do i get this dialog in some cases and not in
others?  Technically the dialog now means "The source in your
project related the refactoring you're performing does not currently
build".  However, i think that that's enormously subtle, and might
lead users to be nervous about the validity of our analysis and
verification steps.  So much so that i'm wondering if we should
undo this performance optimization.

How do you feel about this?

Comments

  • Anonymous
    July 23, 2005
    Well, what about a change in the error message itself that somehow indicates that not everything is rebuilt, but only relevant parts, like: "A part of this project, or one of its dependencies, possibly relevant to the refactoring, does not currently build."

    Yeah, that's a bit long, but you get the idea. It should be no surprise that the code doesn't always touch everything, but it would be good if that error message told us so, to avoid the confusion you suggest.
  • Anonymous
    July 23, 2005
    If the refactoring performance change is that large, you have no choice but to leave it in. 2-3 minutes is intolerable for a refactoring operation in my opinion. Just add more detail to the message if needed, or point to a page in the help docs with even more explanation.
  • Anonymous
    July 23, 2005
    Leave the performance improvement in please.

    Mark
  • Anonymous
    July 23, 2005
    I'm surprised there is such a big demand to allow refactorings on non-buildable code. Further down in the description, Fowler writes:

    The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

    The whole point of not changing the semantics is to improve the design of working code. Typically, after a refactoring, you run your unit tests to make sure they pass.

    Having said that, are there ways that you can find a middle ground. Perhaps allowing some refactorings on broken code (extract method comes to mind), but not on others (rename variable could be problematic on broken code)?
  • Anonymous
    July 23, 2005
    Haacked: "I'm surprised there is such a big demand to allow refactorings on non-buildable code."

    Well, consider our users. I think they actually want two things. One would be "real refactorings" and the other would be "useful large scale code modification features".

    "Further down in the description, Fowler writes:

    The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

    The whole point of not changing the semantics is to improve the design of working code. Typically, after a refactoring, you run your unit tests to make sure they pass. "

    If a refactoring does everything right, you shouldn't need to run any unit tests. The only reason to run unit tests would be if your code is dependent on things beyond the semantics of the language (i.e. performance, or RTTI).


    "Having said that, are there ways that you can find a middle ground. Perhaps allowing some refactorings on broken code (extract method comes to mind), but not on others (rename variable could be problematic on broken code)? "

    Yeah. It's something taht will need lots of thought. :)
  • Anonymous
    July 23, 2005
    I think the error message you're suggesting ("The source in your project related the refactoring you're performing does not currently build") makes perfect sense, and I vote to leave the performance optimization in.
  • Anonymous
    July 23, 2005
    I agree whole-heartedly with the other comment posters; leave the performance improvements and re-write the dialog message.

    I would be happy to trade the possible uncertainty in broken code for faster refactoring in 99% of cases....
  • Anonymous
    July 24, 2005
    Eclipse doesn't warn in every single refactor-with-errors scenario, and it's never bothered me. After all, in this scenario it doesn't matter that other bits of code can't compile, because the refactoring can't affect them anyway.
  • Anonymous
    July 24, 2005
    The comment has been removed
  • Anonymous
    July 24, 2005
    The comment has been removed
  • Anonymous
    July 24, 2005
    I suggest a user option. If they have never set it, when they do a refactoring, ask "Do this quickly but don't check unreferenced code", or "DO this safely but slow." Store that in the refactoring options pane.
  • Anonymous
    July 24, 2005
    The comment has been removed
  • Anonymous
    July 25, 2005
    I am pretty much in the camp of refactoring should be done on workable compilable code. I mean your not refactoring it really if the code isn't compilable. Your at the point where your still writing it. Refactoring I guess means a lot of different things to people. Who would have thunk it. I look at refactoring also kind of like a programming methodology in a way kind of like agile or extreme programming. I mean there have been books upon books written on refactoring.

    Now if Microsoft built something like extreme programming into visual studio there would be hundreds of people complaining. Not because Microsoft didn't follow the rules of extreme programming in fact I am sure they would, but because it doesn't quite fit some peoples needs and wants. Well that is simply because maybe that methodology doesn't fit their needs or they do not understand the methodology. The methodology is there for a reason, and it works great for some people other it doesn't. I say leave the error in, compile the code. Maybe some of the people complaining about it should take some time to learn the rules of refactoring.

    Easy to say for me though you have to make a customer choice decision. What would worry me is if you try to apease those they do not really have an understanding for refactoring, then you break it and make it useless for those that do.
  • Anonymous
    July 25, 2005
    Please leave the optimization in.
  • Anonymous
    July 25, 2005
    "I mean your not refactoring it really if the code isn't compilable. "

    Rubbish. The code may not be compilable because you're simply part way through restructuring it. In such situations, why should you be denied IDE assistance?
  • Anonymous
    July 25, 2005
    Leave the performance improvement in, but
    if they want to know, do the 2-3 minute compile to give them the feedback.

    Thoes who want it wait......

    Eric Schneider
  • Anonymous
    July 25, 2005
    "Rubbish. The code may not be compilable because you're simply part way through restructuring it. In such situations, why should you be denied IDE assistance?"

    You shouldn't be denied assistance. You can still copy and paste and use find and replace. Those things all still work but if your hacking your code to bits from a compilable state into a uncompilable state, your not really refactoring now are ya? Your more or less just wanting the IDE to try to figure out what the heck to do with broken code. That is not what refactoring is.

    http://en.wikipedia.org/wiki/Refactor#Refactoring_code
    Refactoring basically is another step in writing software. It should be done after your code is working to improve how it works. Like I said refactoring is not copy and paste, edit and replace, and try to help me fix these bugs. It shouldn't be considered a tool into writing or working with code that doesn't compile. It was never intended to do that. Everything that can be done under refactoring doesn't aim to break code. Its main goal is to improve the preformance and readability or functioning code. Look up refactoring and see noe of the Key phrases from Martin Fowler. "The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring."

    Refactoring is meant to be done on working code plain and simple. Sorry if you do not like it but that has been the purpose of it for years. I think Microsoft adding refactoring into VS 2005 is a very nice thing I think they did it very effectively so don't break it for the rest of us that apreciate the job they did. There is nothing stopping you from writing a macro or your own tool to do what you want. Refactoring in VS 2005 isn't broken so don't fix it.
  • Anonymous
    July 25, 2005
    "You shouldn't be denied assistance."
    But you just said I should.

    "You can still copy and paste and use find and replace."
    But I may want to use "rename variable" or "extract method" or...

    "Those things all still work but if your hacking your code to bits from a compilable state into a uncompilable state, your not really refactoring now are ya?"
    Don't be so absolutely ridiculous.

    Consider the "rename variable" refactoring. When you invoke that the compiler will (behind the scenes) rename each reference to the variable in turn. When it's only part way through this process, the code will be in an uncompilable state (some references will have changed, others won't; these latter references will cause compile-time errors). Would you claim therefore that the compiler's rename variable does not in fact constitute a refactoring?

    No, of course you wouldn't, because you recognize that the simple task of editing code can leave it in a state that, temporarily, cannot compile.

    I've often come across situations where Eclipse's "extract method" refactoring can't work (because for example I need to make a transformation from a variable mutation to a return and assignment at the point of the call). Semantically, the operation I want to perform is still "extract method"--I just have to extract it by hand. Now, because I can't edit code instantaneously, whilst performing this extraction the code is often in an uncompilable state. For example, in the gap between writing a method declaration and populating its body. However, when doing this, I'll often at the same time want to rename a variable or argument so that it makes more sense given the new code structure.

    In your world, I shouldn't be allowed to use the IDE's refactoring support when doing this; because my code is temporarily broken (because I'm mid-way through the refactoring I wish to perform) I should have to do everything by hand. That's completely idiotic. There's literally no advantage to the restriction you propose. I'm still wanting to perform a refactoring.

    "Your more or less just wanting the IDE to try to figure out what the heck to do with broken code. That is not what refactoring is. "
    I'm wanting the IDE to make changes that don't alter the semantics of the code for me. That's what refactoring is. I'm wanting it to make these changes as part of a larger refactoring effort that I'm performing, because often it doesn't have a built-in mechanism for doing exactly what I want. There is not a single good reason to deny me the ability to do so, and your repeated parrotting of Fowler doesn't change that. In your distorted world-view a manual variable rename or method extraction would not constitute a "refactoring" because of the interim stages in which the code doesn't compile. I'm hard-pressed to come up with a single useful change one could make where this wouldn't be the case. Your position is completely without any kind of merit or virtue, as it makes any and all refactoring impossible.

  • Anonymous
    July 25, 2005
    DrPizza: You're unecessarily making discussion difficult. I would appreciate you being civil here and helping to promote a better atmosphere in general.

    " "You shouldn't be denied assistance."
    But you just said I should."

    You can most certainly be denied "refactorings". However, that doesn't mean that you can't be offered "code manipulation tools" which don't meet the definition of a refactoring. In essense, that's what we're providing here. There's raw textual manipulation support. There's rich support for manipulating code correctly when it's in a buildable state. And there's so-so support for dealing with non-compilable code. We recognize the benefits of each approach, but it's certainly reasonable to suggest that we only provide true "refactorings" when the code is compilable. As Jeff has said, that's pretty much a requirement as per the definition of a refactoring.


    ""You can still copy and paste and use find and replace."
    But I may want to use "rename variable" or "extract method" or..."

    Then we'd provide: "fuzzy rename variable" which may or may not do what you intend if the code is not buildable. Similarly with "fuzzy extract method"


    ""Those things all still work but if your hacking your code to bits from a compilable state into a uncompilable state, your not really refactoring now are ya?"
    Don't be so absolutely ridiculous."

    He's not. He's being correct. If your code isn't compilable, then you're not actually performing a refactoring. And, as you can see here, there have already been a couple of people who would not have a problem with only allowing refactorings of correct code.


    "Consider the "rename variable" refactoring. When you invoke that the compiler will (behind the scenes) rename each reference to the variable in turn. When it's only part way through this process, the code will be in an uncompilable state (some references will have changed, others won't; these latter references will cause compile-time errors). Would you claim therefore that the compiler's rename variable does not in fact constitute a refactoring?"

    Refactorings are treated as atomic actions. During such actions like actual text substitution you can't perform another action (like a refactoring), so the distinction is meaningless.

    "No, of course you wouldn't, because you recognize that the simple task of editing code can leave it in a state that, temporarily, cannot compile."

    Yes, and in the refactoring world it would be understood that during this uncompilable time you would be unable to perform refactorings.

    "In your world, I shouldn't be allowed to use the IDE's refactoring support when doing this; because my code is temporarily broken (because I'm mid-way through the refactoring I wish to perform) I should have to do everything by hand. That's completely idiotic. There's literally no advantage to the restriction you propose. I'm still wanting to perform a refactoring."

    No, there is another alternative which you didn't consider. Provide enough flexibility and power with the refaactorings so that they can handle these cases. You mentioned the problem with eclipse's extract method. Consider if they were smarter about that. Or if they had more refactorings so you could do the variable mutation first through a refactoring, and then do the extractions afterwards.

    ""Your more or less just wanting the IDE to try to figure out what the heck to do with broken code. That is not what refactoring is. "
    I'm wanting the IDE to make changes that don't alter the semantics of the code for me."

    That statement is meaningless in the context of uncompilable code (as i stated in teh preamble). There are no semantics, just guesses.

    "That's what refactoring is. I'm wanting it to make these changes as part of a larger refactoring effort that I'm performing, because often it doesn't have a built-in mechanism for doing exactly what I want. There is not a single good reason to deny me the ability to do so, and your repeated parrotting of Fowler doesn't change that."

    Yes it does. It pushes for a system whereby developers strongly adhere to the philosophy of refactorings, while simultaneously pushing tools developers to provide more refactoring flexibility.

    "In your distorted world-view a manual variable rename or method extraction would not constitute a "refactoring" because of the interim stages in which the code doesn't compile. I'm hard-pressed to come up with a single useful change one could make where this wouldn't be the case. Your position is completely without any kind of merit or virtue, as it makes any and all refactoring impossible."

    No it doesnt'. As exercises we've taken portions of our code base and performed refactorings on them as strictly defined by the refactoring texts and we never ran into the problems that you specified. You just do each refactoring atomically. Test it. Check in.
  • Anonymous
    July 25, 2005
    Your claims of atomicity are complete bunk. "Refactoring" existed long before refactoring-capable IDEs existed. Atomicity is in no way inherent to the definition of refactoring, which is probably why Fowler's definition doesn't say anything about it.

    Any restructuring which doesn't alter the external semantics of a program is a refactoring, whether it be manual (done piecemeal by the programmer) or automatic (done "atomically" by the IDE; bunny ears because IDEs such as Eclipse certainly don't do it atomically; refactoring saves modified but unopened documents, but this saving is not atomic, and if the IDE dies part way through, you're left with... uncompilable code; I don't know if VS.NET behaves the same, but it wouldn't surprise me in the least).

    This claim that the refactoring is something that must be atomic and must not leave the code in an intermediate uncompilable state is an invention that's completely disconnected from reality.

    The other thing that you miss is that when using the IDE to assist in these manual refactorings, the "rename variable" (or whatever) isn't the refactoring any more; the refactoring is what I'm doing. The IDE is just helping me with that task. There's no way I can do each step atomically, because your IDE simply isn't good enough. It's not up to the job. There's no way to get to my desired destination (which is restructured but has identical external semantics so is by definition a refactoring of my current position) without having intermediate uncompilable steps.

    If your IDE were perfect and could do what I wanted all by itself evey time then maybe I'd change my view. But it's not. It's not even close to perfect. As such any restrictions of the kind you propose--restrictions that assume things can be done perfectly--are completely unacceptable.

  • Anonymous
    July 26, 2005
    The comment has been removed
  • Anonymous
    July 26, 2005
    The comment has been removed
  • Anonymous
    July 26, 2005
    The comment has been removed
  • Anonymous
    July 29, 2005
    If refactorings only work on a compilable project they will be mostly useless to me. My primary project is not built (or buildable) through VS.
  • Anonymous
    July 31, 2005
    Cleaning-out my “To Blog” file again…
    Architects

    Handling data in service oriented systemsEdward...
  • Anonymous
    August 01, 2005
    I haven't read the whole discussion, but instead of undoing the performance optimization, can we give users 2 options (compile_all and fast_compile) when refactoring?

    i think this would be a good temporary solution until a better one is found.
  • Anonymous
    August 01, 2005
    now i can brainstorm of several workarounds for this issue, man. but the option above is definitely useful, the more i think about it.
  • Anonymous
    September 20, 2005
    Cyrus, to answer your question, leave the perf optimization in. I think the dialog is near useless anyway as I've already become accustomed to dismissing it w/o reading.

    More importantly (to me, at least), please, please, PLEASE spend some time optimizing refactoring performance with relation to Web projects. I wish I could have refactorings that only take 2-3 seconds!