Understanding the Syntax for URLs

A uniform resource locator (URL) is a standardized pointer to a resource. URLs not only tell you where information is located, they also tell you how to interact with that information. A cousin of the URL is the URN (the N stands for Name). URNs identify a resource by name without regard to the resource location. URLs are the opposite. They identify the location of a resource without regard to what that resource is actually called.

URLs don't have an overly rigid syntax, although most URL schemes follow the same general form.

scheme://username:password@host:port/path?query#fragment

There are other pieces that can go into URLs although these are seen less frequently in WCF applications. Usernames, passwords, and fragments are rare enough that I'm not going to talk about them here. These are primarily used in the HTTP and FTP schemes with an interactive client, rather than against a web service.

The most easily recognizable portion of a URL is the scheme. A scheme indicates the protocol that is needed to access a resource. Our standard transports come with the schemes "http", "https", "net.tcp", and "net.pipe". You'll sometimes see other schemes for transports and you can pick your own scheme when you're writing a channel and make the binding element.

The host and port specify where connections should be established to access a resource. If you're using a hosting environment, like IIS, or discovery then the location of the server is generally determined by configuration information outside the service. Someone then has to tell the client where the server is located. In some cases, particularly when you've got a transport that's purely on-machine, neither the client nor server cares about the actual connection location.

The path has traditionally indicated where a resource resides on the machine, using a hierarchical notation very similar to a file path. There's no requirement to actually map paths to files. When you're using the net.tcp scheme, the path really is just the name of a service without regard to where the files for that service are located. Automatically generated client-side paths for net.tcp services are even less connected to the concept of a file because they contain a random GUID to make each conversation have a unique address.

The query string is used by some schemes to pass parameters to the service. There's no standard for formatting a query and there are very few restrictions on what can go in the query string. Servers that process a query string should make sure that the string does not contain tainted data before doing something with the query. An example is if the query is supposed to contain a local file name, then the server should make sure that the name doesn't specify a network path or contain path symbols that give access to resources outside of those that the server is expected to provide.

Next time: HTTP Request and Response Messages

Comments

  • Anonymous
    June 07, 2006
    Part 3 of the series detailing the standard bindings (Part 1 was on BasicHttp and Part 2 was on NetTcp)....