Strings Stink!
In a previous post, I talked about some ideas about how to write Object Oriented code. Today I'd like to delve a little deeper.
One of my favorite Refactorings is Replace Data Value with Object. You start with a weak type and move to a stronger one. You start with code that manipulates a variable, and move to calling methods on a class.
In the aforementioned post, we went from a string to a FilePath, using this refactoring. Fowler says it's great for builtin types, like string & int. That's because these types have very little structure. Every instance has some unspoken rules about how they should be handled.
The absolute worst is string. As my OO Jedi Master would say, “Strings are smelly“. One the one hand, you can store any kind of information in a string. And in C# (and other modern languages), everyone knows what a string is, and how to create, destroy, pass them. On the other hand, you can store any kind of information in a string. So any smarts about their contents must be in your code that manipulates them.
When we talked about this idea at coffee, we called the use strings “Premature Serialization“ or “Postmature Deserialization“, because we're taking rich information and persisting it in a string, in memory.
So what to do about it? Get started with Refactoring.
- Make up a name for what the string contains. “SocialSecurityNumber“ or “ShoeSize“ or something
- Create a class by this name, and put a string in it. For now, the constructor can take a string, and you even make the field public, just to get things going quickly.
- Replace the original string by an instance of this class. Look for other strings that match, and do the same there.
- Look for places you manipulate the string, and move that logic into methods on the new class.
Once you're done, go back and see if you can do it again. So you went from “string filePath“ to “FilePath filePath“. But if this file represents, say, an MP3, then repeat the refactoring to create the class “MP3File“.
Also look for groups of variables that always go together. Fowler points out the Range pattern, and it's a great one. When you have a “start“ and “end“ of something, make it into a range. I once saw this code:
string startDate;
string endDate;
and boy, did I want to have a party there! This was in C#, where there's already a Date class!
- Replace each with Date instances.
- Replace the pair with a DateRange.
- Figure out the semantics of this DateRange, and build a new class around 'em.
Comments
- Anonymous
February 20, 2004
Do you mean DateTime and TimeSpan, by any chance? - Anonymous
February 20, 2004
Yeah, I suppose so. - Anonymous
February 20, 2004
Nice post! An example of this within the .NET framework would be the System.Uri class. Fundamentally, URI's are strings -- but they are specific types of strings that must conform to certain rules. There's also a common set of operations that you often perform on URI strings. Hence, the System.Uri class, which encapsulates these rules and operations into a strongly typed class that make it easier for developers to deal with string representations of URI's. - Anonymous
February 20, 2004
Steve: Yeah, System.Uri isn't so strong in this department. That's one that's had a lot of discussion here at Microsoft recently.
Thanks for your thoughts. - Anonymous
February 20, 2004
Objective C has a really neat concept (I think inherited from smalltalk) called categories. They are sort of like localized inheritance light. What they allow you to do is add your own methods onto other people's object. So you could add a Pathinfo category to string that would let you do stuff like somestring.UNC. You don't have access to any of the private stuff, but you don't when you're using a raw string either.
It can be totally abused and people end up with the giganto classes, but it is a really nice way to add methods that SHOULD be on an object but just aren't, like decimal.GetBytes(). Decimal actually just has a GetBits method, which of course returns an array of 4 ints which is exactly what I would expect of a method called GetBits.... - Anonymous
February 20, 2004
The comment has been removed - Anonymous
February 21, 2004
To be more specific, mixed objects with instance and static members are not in my view pure OO.
Maybe if the entire type is static , but not mixed. - Anonymous
February 21, 2004
Jay, I wonder how you think of Int32 usage. Rather than Collection.Count, where Count is an Int32, do you advocate that Count should be of type Count? The same concept applies for String.Length, array bounds, etc.
Is being pure OO sufficient justification for the overhead of creating (and remembering how to use) these additional types? - Anonymous
February 21, 2004
Its nice to have all this but once you start to actually design a solution and take dependancies on 3rd party libraries and you have deadlines to reach, you do start to throw all of that out the window. Its called being practical. - Anonymous
February 21, 2004
Unless all this is enforced by the language its not going to see the light of day in a real world solution. Maybe in academia but not in the world of the real. And definately not once when the perf team gets theyre hands on it. - Anonymous
February 22, 2004
The comment has been removed - Anonymous
February 23, 2004
The comment has been removed - Anonymous
February 23, 2004
Louis: In these articles I'm talking about what OO looks like. You get to decide what to use when.
My goal is to get the clearest code I can.
In the case of most strings, I end up building logic that is type-specific into client code, because I don't have a type to put it in.
So, for String.Length: does it help me write clear code if this returns a dedicated type? Do I find myself repeating algorthims on the Length throughout my code?