Partilhar via


The Windows command line is just a string...

Yesterday, Richard Gemmell left the following comment on my blog (I've trimmed to the critical part):

I was referring to the way that IE can be tricked into calling the Firefox command line with multiple parameters instead of the single parameter registered with the URL handler.

I saw this comment and was really confused for a second, until I realized the disconnect.  The problem is that *nix and Windows handle command line arguments totally differently.  On *nix, you launch a program using the execve API (or  it's cousins execvp, execl, execlp, execle, and execvp).  The interesting thing about these APIs is that they allow the caller to specify each of the command line arguments - the signature for execve is:

int execve(const char * filename , char *const argv [], char *const envp []);

In *nix, the shell is responsible for turning the string provided by the user into the argv parameter to the program[1].

 

On Windows, the command line doesn't work that way.  Instead, you launch a new program using the CreateProcess API, which takes the command line as a string (the lpComandLine parameter to CreateProcess).  It's considered the responsibility of the newly started application to call the GetCommandLine API to retrieve that command line and parse it (possibly using the CommandLineToArgvW helper function).

So when Richard talked about IE "tricking" Firefox by calling it with multiple parameters, he was apparently thinking about the *nix model where an application launches a new application with multiple command line arguments.  But that model isn't the Windows model - instead, in the Windows model, the application is responsible for parsing it's own command line arguments, and thus IE can't "trick" anything - it's just asking the shell to pass a string to the application, and it's the application's job to figure out how handle that string.

We can discuss the relative merits of that decision, but it was a decision made over 25 years ago (in MS-DOS 2.0).

 

[1] Yes, I know that the execl() API allows you to specify a command line string, but the execl() API parses that command line string into argv and argc before calling execve.

Comments

  • Anonymous
    October 03, 2007
    The comment has been removed

  • Anonymous
    October 03, 2007
    Dave, that's entirely possible.  OTOH, for OS versions before 2.0, launching a new program was actually a function of command.com - there was no OS API for launching a new process.

  • Anonymous
    October 03, 2007
    It is of course worth noting that if you link your C program with mainCRTStartup or wmainCRTStartup, the C runtime decodes into argc/argv and calls your main or wmain function respectively. It's unusual, but not forbidden, for a Windows application (i.e. an application that registers and uses its own window classes, rather than a console) to do this. The bit governing whether or not a console is created for the application is an independent setting, set in the PE header by the linker (/SUBSYSTEM:CONSOLE vs /SUBSYSTEM:WINDOWS). Visual Studio sets its defaults so console applications use (w)main, and Windows applications use (w)WinMain, but it's not required. I don't know what Firefox does but I'd take a guess that they might be using (w)main for portability.

  • Anonymous
    October 03, 2007
    Mike: Absolutely.  I actually had a paragraph in the post describing that but edited it out (because I thought it rendered the narrative flow awkwards).

  • Anonymous
    October 03, 2007
    And even if you use WinMain, you can still make use of the C runtime's argument decoding by accessing __argc and __argv. In other words, the following are all completely orthogonal to each other:

  • Whether you are /SUBSYSTEM:CONSOLE or /SUBSYSTEM:WINDOWS

  • Whether your entry point is mainCRTStartup (calls main) or WinMainCRTStartup (calls WinMain)

  • Whether you access arguments via __argc/__argv or as a raw string from GetCommandLine

  • Whether your program creates a GUI or calls console APIs (or both)

  • Anonymous
    October 03, 2007
    The comment has been removed

  • Anonymous
    October 03, 2007
    This manual provides info on how programs were loaded in early versions of DOS. Be warned that most of the numbers are in decimal, NOT hex: http://www.patersontech.com/Dos/Docs/86_dos_prog.pdf

  • Anonymous
    October 03, 2007
    Alun Jones expanded on this on his blog back when the fires were still raging: http://msmvps.com/blogs/alunj/archive/2007/07/23/firefoxurl-part-ii.aspx

  • Anonymous
    October 03, 2007
    You have a bit of an odd phrasing here which threw me for a loop.  ("In *nix, the shell is responsible for turning the string provided by the user into the argv parameter to the program.") I'd say the caller is responsible, rather than the shell.  A shell is only involved if you're in a shell, or if your code calls system(), or popen(), or some other hugely dangerous system call, like pwnme().

  • Anonymous
    October 03, 2007
    The comment has been removed

  • Anonymous
    October 03, 2007
    The comment has been removed

  • Anonymous
    October 03, 2007
    The comment has been removed

  • Anonymous
    October 04, 2007
    The comment has been removed

  • Anonymous
    October 05, 2007
    The comment has been removed

  • Anonymous
    October 05, 2007
    Rosyna, I'm not sure that I understand the difference between the two paradigms, or why one is better than the other. In one paradigm, an application (the shell) parses a string and converts it to arguments.  In the other paradigm, an application (the application being called) parses a string and converts it to arguments. The only significant difference is that in the *nix paradigm, the caller doesn't have to interpret the intent of the parent - but there's also an opportunity for mischief there, because the parent can produce strings that are impossible for the shell to create (and thus may not have been tested by the application).

  • Anonymous
    October 05, 2007
    The comment has been removed

  • Anonymous
    October 07, 2007
    The comment has been removed

  • Anonymous
    October 07, 2007
    Thank you Harry, that's essentially what I was going to say (but you said it better). Ultimately, someone's going to have to determine intent from a string, whether it's the shell or the application.  In the Windows model, the app determines the parsing all the time.  In the *nix model, the app is at the mercy of the shell - different shells could very easily have different parsing algorithms, which means that depending on your choice of shell, your application might behave differently, and that's never a good thing.

  • Anonymous
    October 08, 2007
    The comment has been removed

  • Anonymous
    October 08, 2007
    The comment has been removed

  • Anonymous
    October 08, 2007
    The comment has been removed

  • Anonymous
    October 09, 2007
    The comment has been removed

  • Anonymous
    October 09, 2007
    The comment has been removed

  • Anonymous
    October 09, 2007
    The comment has been removed

  • Anonymous
    October 09, 2007
    The comment has been removed

  • Anonymous
    October 09, 2007
    Oh neat.  Not only can Win32 create folders that Win32 has trouble accessing (depending on how Win32 tries to go about accessing the folders that it created), and not only can Windows Services for Unix create folders that Win32 can't access, but I've just seen Windows Services for Unix have trouble accessing files that Windows Services for Unix created. If keyboard handling had to be destroyed in order to improve security, I think I'd have some amount of grudging understanding.  But when keyboard handling is destroyed solely for the purpose of destroying keyboard handling, and security looks like it's going to get worse instead of better, Microsoft still gives a big impression of not "getting it".

  • Anonymous
    October 10, 2007
    > Ultimately, someone's going to have to determine intent from a string, whether it's the shell or the application. Larry, that isn't true.  One thing you keep forgetting is that under *nix, you don't have to use the shell.  When you are writing system code, you can directly invoke an exec system call and thus the intent can be determined by the engineer. So, if a program includes the code:  execl("/path/to/exe", "arg1", "arg2", get_arg3(), NULL); the intent of each argument is built into the logic of the code and no string processing needs to happen.

  • Anonymous
    October 10, 2007
    The comment has been removed

  • Anonymous
    October 11, 2007
    The comment has been removed

  • Anonymous
    October 25, 2007
    The comment has been removed