Strings Stink!

In a previous post, I talked about some ideas about how to write Object Oriented code.  Today I'd like to delve a little deeper.

One of my favorite Refactorings is Replace Data Value with Object.  You start with a weak type and move to a stronger one.  You start with code that manipulates a variable, and move to calling methods on a class.

In the aforementioned post, we went from a string to a FilePath, using this refactoring.  Fowler says it's great for builtin types, like string & int.  That's because these types have very little structure.  Every instance has some unspoken rules about how they should be handled. 

The absolute worst is string.  As my OO Jedi Master would say, “Strings are smelly“.  One the one hand, you can store any kind of information in a string.  And in C# (and other modern languages), everyone knows what a string is, and how to create, destroy, pass them.  On the other hand, you can store any kind of information in a string.  So any smarts about their contents must be in your code that manipulates them.

When we talked about this idea at coffee, we called the use strings “Premature Serialization“ or “Postmature Deserialization“, because we're taking rich information and persisting it in a string, in memory.

So what to do about it?  Get started with Refactoring. 

  1. Make up a name for what the string contains.  “SocialSecurityNumber“ or “ShoeSize“ or something
  2. Create a class by this name, and put a string in it.  For now, the constructor can take a string, and you even make the field public, just to get things going quickly.
  3. Replace the original string by an instance of this class.  Look for other strings that match, and do the same there.
  4. Look for places you manipulate the string, and move that logic into methods on the new class. 

Once you're done, go back and see if you can do it again.  So you went from “string filePath“ to “FilePath filePath“.  But if this file represents, say, an MP3, then repeat the refactoring to create the class “MP3File“. 

Also look for groups of variables that always go together.  Fowler points out the Range pattern, and it's a great one.  When you have a “start“ and “end“ of something, make it into a range.  I once saw this code:

string startDate;
string endDate;

and boy, did I want to have a party there!  This was in C#, where there's already a Date class! 

  1. Replace each with Date instances.  
  2. Replace the pair with a DateRange.
  3. Figure out the semantics of this DateRange, and build a new class around 'em.

Comments

  • Anonymous
    February 20, 2004
    Do you mean DateTime and TimeSpan, by any chance?
  • Anonymous
    February 20, 2004
    Yeah, I suppose so.
  • Anonymous
    February 20, 2004
    Nice post! An example of this within the .NET framework would be the System.Uri class. Fundamentally, URI's are strings -- but they are specific types of strings that must conform to certain rules. There's also a common set of operations that you often perform on URI strings. Hence, the System.Uri class, which encapsulates these rules and operations into a strongly typed class that make it easier for developers to deal with string representations of URI's.
  • Anonymous
    February 20, 2004
    Steve: Yeah, System.Uri isn't so strong in this department. That's one that's had a lot of discussion here at Microsoft recently.

    Thanks for your thoughts.
  • Anonymous
    February 20, 2004
    Objective C has a really neat concept (I think inherited from smalltalk) called categories. They are sort of like localized inheritance light. What they allow you to do is add your own methods onto other people's object. So you could add a Pathinfo category to string that would let you do stuff like somestring.UNC. You don't have access to any of the private stuff, but you don't when you're using a raw string either.

    It can be totally abused and people end up with the giganto classes, but it is a really nice way to add methods that SHOULD be on an object but just aren't, like decimal.GetBytes(). Decimal actually just has a GetBits method, which of course returns an array of 4 ints which is exactly what I would expect of a method called GetBits....
  • Anonymous
    February 20, 2004
    The comment has been removed
  • Anonymous
    February 21, 2004
    To be more specific, mixed objects with instance and static members are not in my view pure OO.

    Maybe if the entire type is static , but not mixed.

  • Anonymous
    February 21, 2004
    Jay, I wonder how you think of Int32 usage. Rather than Collection.Count, where Count is an Int32, do you advocate that Count should be of type Count? The same concept applies for String.Length, array bounds, etc.

    Is being pure OO sufficient justification for the overhead of creating (and remembering how to use) these additional types?
  • Anonymous
    February 21, 2004
    Its nice to have all this but once you start to actually design a solution and take dependancies on 3rd party libraries and you have deadlines to reach, you do start to throw all of that out the window. Its called being practical.

  • Anonymous
    February 21, 2004
    Unless all this is enforced by the language its not going to see the light of day in a real world solution. Maybe in academia but not in the world of the real. And definately not once when the perf team gets theyre hands on it.
  • Anonymous
    February 22, 2004
    The comment has been removed
  • Anonymous
    February 23, 2004
    The comment has been removed
  • Anonymous
    February 23, 2004
    Louis: In these articles I'm talking about what OO looks like. You get to decide what to use when.

    My goal is to get the clearest code I can.

    In the case of most strings, I end up building logic that is type-specific into client code, because I don't have a type to put it in.

    So, for String.Length: does it help me write clear code if this returns a dedicated type? Do I find myself repeating algorthims on the Length throughout my code?