How to await a command-line process, and capture its output

[This post is part of a series How to await a storyboard, and other things]

 

I want to invoke an external executable, and await until it’s finished, and get back its output. (In my case, the external executable is called “tidy.exe” – an excellent open-source utility for turning HTML into clean parseable XML.) To cut a long story short, here’s my final code. The rest of the article has explanations of how it works and why. 

Async Function Attempt4Async() As Task

    Dim client As New HttpClient

    Dim html = Await client.GetStringAsync("https://blogs.msdn.com/lucian")

    Dim tidy = Await RunCommandLineAsync("tidy.exe",

                                         "-asxml -numeric -quiet --doctype omit",
html)

    Dim xml = XDocument.Parse(tidy)

End Function

 

 

Async Function RunCommandLineAsync(cmd As String, args As String, stdin As String,

                                   Optional cancel As CancellationToken = Nothing

                                  ) As Task(Of String)

    Using p As New Process

        p.StartInfo.FileName = cmd

        p.StartInfo.Arguments = args

        p.StartInfo.UseShellExecute = False

        p.StartInfo.RedirectStandardInput = True

        p.StartInfo.RedirectStandardOutput = True

        p.StartInfo.RedirectStandardError = True

        If Not p.Start() Then Throw New InvalidOperationException("no tidy.exe")

 

        Using cancelReg = cancel.Register(Sub() p.Kill())

            Dim tin = Async Function()

                          Await p.StandardInput.WriteAsync(stdin)

                          p.StandardInput.Close()

                      End Function()

            Dim tout = p.StandardOutput.ReadToEndAsync()

            Dim terr = p.StandardError.ReadToEndAsync()

            Await Task.WhenAll(tin, tout, terr)

 

            p.StandardOutput.Close()

            p.StandardError.Close()

            Await Task.Run(AddressOf p.WaitForExit)

 

            Dim stdout = Await tout

            Dim stderr = Await terr

            If cancel.IsCancellationRequested Then

                Throw New OperationCanceledException(stderr)

            ElseIf String.IsNullOrEmpty(stdout) Then

                Throw New Exception(stderr)

            Else

                Return stdout

            End If

        End Using

    End Using

End Function

 

 

Process.Exited event

Stephen Toub wrote a very useful article “Await Anything” that I always use as my starting point. He included an example of awaiting for a process. That article used the Process.Exited event, and lead me to this code: 

Async Function Attempt1Async() As Task

    Dim p As New Process

    p.StartInfo.FileName = "notepad.exe"

    p.Start()

 

    Dim tcs As New TaskCompletionSource(Of Integer)

    p.EnableRaisingEvents = True

    AddHandler p.Exited, Sub() tcs.TrySetResult(p.ExitCode)

    If p.HasExited Then tcs.TrySetResult(p.ExitCode)

    Await tcs.Task

End Function

 

Note the check for p.HasExited comes after adding the handler. If we didn’t do that, and if the process exited before adding the handler, then we’d never hear about it. 

The code works great as is. But once I added in code to capture the output then I uncovered a race condition inside the Process.Exited event – it seems that the triple combination of ReadToEnd(), Dispose() and HasExited can cause the Exited event to fire more than once! 

Repro: The following code will non-deterministically throw an exception because the process finishes and fires an event, then “await tcs.Task” finishes, then the event is fired a second time, and its attempt to read “p.ExitCode” is now invalid. The exception also seems to happen even if I RemoveHandler before EndUsing. 

Async Function Attempt2Async() As Task

    For i = 0 To 1000

        Using p As New Process

            p.StartInfo.FileName = "cmd.exe"

            p.StartInfo.Arguments = "/c dir c:\windows /b"

            p.StartInfo.UseShellExecute = False

            p.StartInfo.RedirectStandardOutput = True

            p.Start()

            Dim stdout = Await p.StandardOutput.ReadToEndAsync()

            '

            Dim tcs As New TaskCompletionSource(Of Integer)

            p.EnableRaisingEvents = True

            AddHandler p.Exited, Sub() tcs.TrySetResult(p.ExitCode)

            If p.HasExited Then tcs.TrySetResult(p.ExitCode)

            Await tcs.Task

        End Using

    Next i

End Function

 

If the mechanism of communication is a wait-handle, then you need to block some thread

There might be clever ways to make Process.Exited work, but I didn’t care to risk them. So here’s my next attempt at awaiting until the process has finished. I tested it inside a while loop, and it doesn’t suffer from the same race condition as before. 

Async Function Attempt3Async() As Task

    Dim p As New Process

    p.StartInfo.FileName = "notepad.exe"

    p.Start()

 

    Await p

End Function

  

 

<Extension> Function GetAwaiter(p As Process) As TaskAwaiter

    Dim t As Task = Task.Run(AddressOf p.WaitForExit)

    Return t.GetAwaiter()

End Function

 

Here’s an explanation for why the GetAwaiter() is written as it is. Process has a method p.WaitForExit() which blocks a thread until the process has finished. It also exposes an IntPtr property “Handle”, which is the raw win32 handle that will be signalled once the process has completed. I want to await until the process has finished, or equivalently until the handle is signaled. If the mechanism of communication is a wait handle, then I need to block some thread. Since I have to block some thread anyway, I might as well use WaitForExit(), which is slightly easier than the handle.

 

RunCommandLineAsync

I already started this article with the finished code, “RunCommandLineAsync”. Its code is subtle. Here’s an explanation.

Cancellation. If cancellation is requested, then I need to be able to terminate the process. I do that with “cancel.Register(Sub() p.Kill())”. As soon as cancellation is requested, this will invoke the lambda, which will kill the process. This will terminate the process abruptly, closing the handles cleanly (i.e. without exception). In this state I have no guarantee that stdout is complete. Therefore, I throw an OperationCancelledException. Note that even if cancellation is requested and the process gets killed, I still await until the process has completely finished before returning (rather than returning immediately and letting it clean up asynchronously in the background). I figure this is a more hygienic design.

Tin, Tout, Terr. The precise way that “RedirectStandardInput/Output/Error” will work depends on the internal details of the process we’re launching. All we can say with any certainty is that (1) the process might not finish until we close its StandardInput; (2) we might have to read data from StandardOutput/StandardError before we can write any more data into StandardInput; (3) we might have to write more data into StandardInput before we can read more data from StandardOutput/StandardError. Those constraints imply that the three tasks “tin/tout/terr” must all be launched concurrently, and then awaited using Task.WhenAll(). 

Await tout. The code does “Dim stdout = Await tout” even after we know that tout has finished (because we awaited Task.WhenAll). That’s fine. It merely means that the call to “Await tout” will complete immediately.

Async vs threads. A few years ago, I wrote similar code using threads instead of async. I'm happy that the async version is cleaner and more readable.

 

Wrapping it up

My ultimate goal behind this project was to be able to scrape web-pages. And I wanted to use VB’s XML-literals to make that easy. Here’s how I wrapped it up, to scrape all img tags from a page: 

Async Function Attempt5Async(Optional cancel As CancellationToken = Nothing) As Task

    Dim client As New HttpClient

    Dim xml = Await client.GetXmlAsync("https://blogs.msdn.com/lucian", cancel)

 

    For Each img In (From i In xml...<img> Select i.@src Distinct)

        Console.WriteLine(img)

    Next

End Function

 

 

<Extension>

Async Function GetXmlAsync(client As HttpClient, uri As String,

                           Optional cancel As CancellationToken = Nothing

                          ) As Task(Of XDocument)

    Using response = Await client.GetAsync(uri, cancel)

        Dim html = Await response.Content.ReadAsStringAsync()

        Dim tidy = Await RunCommandLineAsync("tidy.exe",

                              "-asxml -numeric -quiet --doctype omit", html, cancel)

        Return XDocument.Parse(tidy)

    End Using

End Function

Comments

  • Anonymous
    December 12, 2012
    The comment has been removed
  • Anonymous
    December 13, 2012
    @Daniel, you're relying on the fact that if you set p.EnableRaisingEvents = True before calling Start(), and also if you register the event handler before you call Start(), then there's no need to check "p.HasExited". And you're relying on the hope that if you avoid the check of p.HasExited, then the Exited event no longer gets called twice. From my experiments, the hope seems to be justified. (i.e. if I omit the check of p.HasExited, then I only ever observed the Exited event being fired once). But without knowing the root cause of the double-Exited-event, I'm loathe to trust that hope.