The Changeset at the Pend of the Universe
Reading Jason’s post on lock types got me thinking about the general concept of pending changes. While some users may already be familiar with the concept, not all other version control systems use similar models. For the uninitiated, here’s a bit of background that may clear things up. Thanks to James for his input!
What is a pending change?
A pending change is a statement of intent, to some degree. Users pend changes to let the central repository know about a change they plan to check in at some later time. Until they either check in or undo the change, the server keeps a record of the type of change pended. If multiple users try to pend changes on the same item, all users after the first will see warnings letting them know that there are other people working on the file, too.
This is distinct from operations which immediately perform their entire functionality. For example, the get
command immediately retrieves the files requested to the local disk. The command workspaces
immediately prints out a list of workspaces. Most commands which change the state of files in the repository, however, require first a command to pend the change and then a command to check the change in.
For example, let's say we just installed Visual Studio Team System. We'll refer to this diagram throughout our examples. At this point, the system has one folder in source control—the root folder '$/'. Then, a user comes along and pends several adds, like "h add a doc /recursive"* which pends an add on the folders 'a' and 'doc' as well as all folders and files under them. These files do not exist in the repository until the user checks in the change. At that point, version control attempts to upload the changes to the server and, if successful (for instance, no conflicts exist), assigns a changeset number.
Where do pending changes go?
When a change is pended, the effect of the operation occurs in the files of the user’s local workspace. If the change is later checked in, the main repository files are updated to reflect the change. Otherwise, if it is undone, the user’s local files are rolled back to the state they were in prior to pending the change. If a user has more than one workspace, only the workspace in which the change was pended will reflect the change.
The status
command displays a list of pending changes. Without any other arguments, it only displays changes pending for the current user in the current workspace. (The "current workspace" is determined by the directory you call "h status" from combined with your current workspace mappings. If the folder falls within one of the mappings, the client will default to that workspace. If the current folder is unmapped, you will need to specify the workspace. For more on mappings, see this helpful post.) The /workspace
flag can be used to indicate the workspace or "*" for all workspaces. For example, "h status /workspace:AdamSiWS2" or "h status /workspace:*". In addition, the /user
flag can be used to specify another user or "*" for all users, as in "h status /workspace:* /user:CORPNET\JSmith" which will show all pending changes for user CORPNET\JSmith. "h status /workspace:* /user:*" will show all pending changes for all users. The status
command also accepts a filespec and then only shows pending changes on the indicated files. "h status . /recursive" will show the any pending changes on folders and files located within the current directory tree.
In our diagram, the white boxes actually represent two things. At the blue arrow point in time, they represent the changes that are being pended to the workspace. At the green arrow point in time, the changes are being checked in to become a changeset. They are reflected in the user's local workspace but are not yet available to anyone else. The server knows that these changes are pending, but has no other information about them.
Which commands pend changes?
Changes are pended by the commands add
, branch
, checkout
(also known as "edit
"), delete
, merge
, rename
(also known as "move
"), and undelete
. For branch
and merge
, the change is actually pended on the target rather than the source of the operation. While locks will also appear in a status query, the lock is actually applied as soon as the lock
command is called and is unlocked when the lock-holder checks in.
Back to our example, lets look at the second change. The user may have typed "h branch a b" to pend the branch, "h delete doc\r.txt" for the "h edit doc\s.txt" followed by some text change for the edit, and "h add doc\t.txt" for the add. The user may also have had other pending changes that were not submitted as part of this checkin, such as "h add foo\bar.vb".
Basically, changing files in the repository is a two step process. First, you have to let the server know what you’re planning to do. Then, after making the appropriate changes, you tell the server to update the files. This lets you try out your changes locally before updating the repository. What happens if someone else beats you to the punch? Well, you get a conflict—but that’s a topic for another post.
Note: At some point, the command 'h.exe' will be changing to 'tf.exe' or similar. However, for the December CTP and earlier versions, the version control command line executable is still named 'h.exe'.
Comments
- Anonymous
January 27, 2005
How does all this work when you're offline? Is it generally true that you can "unofficially" pend absolutely anything while offline (as long as it doesn't require fetching stuff from the repository that you don't have a local copy of, which might be the case for eg a branch operation if you don't need to have a working copy of the source in order to branch) but not check anything in? Obviously the server wouldn't be aware of the pended operations until you connect.
If not, then it should be...
By the way, in the same way that "move" and "rename" are synonymous, you might want to consider making "copy" synonymous with "branch". It provides a useful mental model for users, has no downsides I can think of (as long as branching is done in an efficient way in the repository), and discourages users from losing history by copying files outside of source control and just adding the copies. - Anonymous
January 27, 2005
Note that this is V1 behavior and may change for V2.
If you've already pended an edit when online, you can work offline as much as you like. If, however, you haven't yet pended an edit and would like to modify a file, you can choose to "overwrite" when you save in Visual Studio, thus turning off the read-only bit. You could also attrib from a command shell. When you go back online, you will need to pend the edit (e.g. "h edit a.cs") to let version control know about it (we don't automatically detect these changes in V1). If the file has changed in the repository since the last time you connected, you will get a warning ("newer version exists in the repository"). When you then run "h get" or "h checkin", you will get a conflict that you will need to resolve. This won't work for other commands, such as delete, rename, and branch.
"Branch" creates a notion of a common ancestor between the files and folders involved. In our example, $/a/a.cs and $/b/a.cs have a version ($/a/a.cs;c2) that can be used to later merge individual changes between the two. If we simply copied the folder and added new files, we woudn't have a base to later merge the files together. Because of this, branch is actually more than "copy" and creating the alias could cause confusion. - Anonymous
January 27, 2005
The comment has been removed - Anonymous
January 28, 2005
Stuart,
Offline is a big deal, no doubt - we're certainly keen on a better story in v2 but (being a lowly tester) I can't make any promises about when or how we'll make it better.
We can get a suggestion filed to make copy an alias of branch. There are some aspects of "copy" that are not necessarily part of branch, in the sense that branch is a bit more flexible than "copy" implies.
Specifically, you don't necessarily branch the workspace version - you can branch the workspace version, the latest, or as of any date, changeset, or label. Copy sort of implies workspace (though it'd still support a version parameter, so maybe that's no big deal). But, the bigger one to me is that you don't actually have to copy anything locally. If you branch with "/noget" as an argument, it doesn't create the branch target in your workspace (if you're branching a very large tree, that saves you a lot of trouble, particularly if you won't need the branch target in your workspace).
Your point that the "common usage" cases of branch look and feel like copies is valid. But I suspect we'll still prefer branch being the default terminology, just because we are talking about a considerable superset of copy behavior.
Branch doesn't save much space over copy-and-add, because we only save one instance of any given content either - file signature hashes are a beautiful thing. There's a lot more to it than that because of various things we do to improve storage performance in terms of space, transfer time, etc., but one side effect is that we (should) only store a unique set of file contents once. - Anonymous
January 28, 2005
By the way, I know your "just a lowly tester" comment was tongue-in-cheek, but FWIW, speaking as a developer I have the utmost respect for testers. It's a job I simply am not capable of doing well, and I give great kudos to the people who are able to do it. My small company doesn't have a dedicated tester at all any more, and I feel that pain daily. Just thought you might like to know how much this developer, at least, appreciates you and others like you ;)
My point about copy/branch is more the other way round - not so much that the common usages of branch look like copies, but that all usages of copy are equivalent to a simple case of branch.
I agree about branch being the default terminology and with your reasons for it.
The file signature hashing feature sounds amazing, I love that idea. It may even apply to a problem I'm facing for a future version of the product I work on - it's not patented or anything is it? ;)