Writing Version Control Migration Tools - Handling Namespace Conflicts
Migrating from one version control system to another is tough. I don’t care what the internet forums are saying or what Joe from down the hall told you. It’s hard. Very hard. Deceptively hard.
The obvious algorithm looks trivial:
FOR EACH Changeset CS in History DO
FOR EACH Change C in Changeset CS DO
SWITCH C.Action
CASE ADD:
DownloadFromSource(C)
PendAddOnTarget(C)
BREAK
CASE EDIT:
DownloadFromSource(C)
PendEditOnTarget(C)
BREAK
... and so on ...
NEXT
NEXT
Piece of cake. I’ll have it ready by noon.
The problem is that this algorithm falls over pretty quickly when presented with even relatively trivial changesets.
Over the next few posts I’ll present a few examples of changes that cause that algorithm to fall over and what some of the options are. These are not pathological cases that never happen in the real world. These are real examples that I see “in the wild” very frequently.
We’ll start with a simple change that has a namespace collision (i.e. the same namespace, “foo” in this case, is involved in two different operations in a single changeset). Foo is renamed to “bar” and then a new item named “foo” is added.
rename foo bar
add foo
In the above presented algorithm there is no guarantee that the rename of foo to bar would occur before the add of foo. If it does not occur first then the add will fail since there is already an item in that namespace.
Further the target system needs to support this. Imagine you were mirroring TFS to Perforce and had that sequence of operations. The Perforce equivalent would be:
integrate foo bar
delete foo
add foo
See the problem? Perforce doesn’t allow pending an add where you already have a delete. So you need to start making decisions. Do you do something like:
integrate foo bar
edit foo
Well – that’s nice only now you have an integration relationship between two otherwise unrelated items. How about this:
add bar
edit foo
Ok – that works. Only now you’ve lost the history that says that bar used to be foo. Maybe the answer is multiple changesets:
integrate foo bar
delete foo
submit
add foo
submit
Alright – but now you’ve split a single source operation into two target operations. Is that acceptable? If it is – what if the process crashed after the first submit. Will the migration tool have enough context when it is restarted to finish the second change? What if a length of time passed between the two submits and another foo was added? What should the checkin comment for the first and second changes be? Should the first indicate that there will be a second? Should the second indicate that it is part of a multiple checkin sequence? What if there are hundreds of files like this? Does each get its own checkin? Can you group them?
So what is the bottom line here? Two things:
1) When migrating there may be an implied order to the operations that must be satisfied for the change to work.
2) System feature parity makes migrating much more complex.
Next time: cycles – what they are, how to identify them and what you can do about them.
Comments
- Anonymous
October 25, 2006
Last time I discussed the issues surrounding namespace conflicts when designing a version control migration