Share via


Regex 101 Exercise S4 - Extract load average from a string - Discussion

Exercise S4 - Extract load average from a string

The shop that you work with has a server that writes a log entry every hour in the following format:

8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13

You need to write a utility that lets you track the load average on an hourly basis. Write a regex that extracts the time and the load average from the string.

****

This is pretty close to the first thing I ever did with regular expressions. I had some logfile information I needed to process. I started writing in C++, and if you've ever tried to do lots of character manipulation in C++, you know how much fun that can be.

For this sort of thing, I like to look for good delimiters. To get the time, I'll use "up" as the delimiter, which means I can match with:

.+\s*up

The \s is something new, it means "any whitespace character". I next need to pull out the load average. I'll use "load average:" as the delimiter, so the regex to pull that out is:

load average:\s*[0-9.]+

and I can string them together to get:

.+\s*up # match time
.+? # skip middle section
load\ average:\s*[0-9.]+ # match load average

I added the middle clause to skip the characters in the middle that I don't care about. I also switched to multi-line mode, which means that I need to use RegexOptions.IgnorePatternWhitespace, and that required me to change "load average" to "load\ average" so that the regex engine wouldn't ignore the space (after I stared at it for a minute, wondering why it wasn't working...)

If I run this in regex workbench, it will report:

    0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13

That tells me that the match worked, but not much else. What I need is a way to extract certain parts of the string, which is done with a "capture" in the regex language. The simplest form of a capture is done by enclosing part of the regex in parenthesis:

(.+)\s*up # match time
.+? # skip middle section
load\ average:\s*([0-9.]+) # match load average

Executing that gives:

    0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13
1 => 8:00 am
2 => 0.13

The first capture (index 0) is always the entire match, and then subsequent captures correspond to the portions of the match enclosed in parenthesis. In code, if I wanted to pull the time out, I would write something like:

string time = match.Groups[1].Value;

That works fine. I could declare victory, but I don't really like the "Groups[1]" part - it doesn't tell me much. Nicely, the .NET regex variant provides (as do some others) A way to name captures. That allows me to write:

(?<Time>.+)\s*up # match time
.+? # skip middle section
load\ average:\s*(?<LoadAverage>[0-9.]+) # match load average

Running that gives me:

    0 => 8:00 am up 23 day(s), 21:34, 7 users, load average: 0.13
Time => 8:00 am
LoadAverage => 0.13

and I could now write code that looks like:

string time = match.Groups["Time"].Value;

which is very clear - clear enough that I often will not bother with the local variable.

That's gets us to where I wanted to get. You may have noticed that I didn't try to validate the time nor did I use anchors for the beginning and end of the string. In this example, I'm dealing with well formed text - the server log is always going to look the way that it does - and it's not worth the effort or complexity to do more than what I did.

Comments

  • Anonymous
    November 18, 2005
    Doesn't that make Time become "8:00 am " instead of "8:00 am "?

    One fix:
    (?<Time>.+)sup
    becomes
    (?<Time>.+?)s
    up

    The problem being that in a battle like (.+)s*, the s* part will always lose.
  • Anonymous
    November 19, 2005
    One way of doing this is to greedy-match the entire middle of the string; since we know time is at the beginning and the load average is at the very end it's just as fast if not faster (I should think? Maybe I'm wrong) to use greedy matching for the "useless" bits:

    ^(?<time>d+:d+s+(am|pm)) # match time
    .+ # greedy-match
    (?<load>d+.d+)$ # match load
  • Anonymous
    June 09, 2009
    PingBack from http://quickdietsite.info/story.php?id=4263