Share via


WideFinder--Naive F# Implementation

Jomo Fisher--Here's an interesting problem that some people are having fun with. Don Box posted a naive implementation in C# so I thought I'd post the equivalent in F#: 

#light

open System.Text.RegularExpressions

open System.IO

open System.Text

let regex = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)", RegexOptions.Compiled)

let seqRead fileName =

    seq { use reader = new StreamReader(File.OpenRead(fileName))

          while not reader.EndOfStream do

              yield reader.ReadLine() }

             

let query fileName =

    seqRead fileName

    |> Seq.map (fun line -> regex.Match(line))

    |> Seq.filter (fun regMatch -> regMatch.Success)

    |> Seq.map (fun regMatch -> regMatch.Value)

    |> Seq.countBy (fun url -> url)

And here's the code to call it:   

for result in query @"file.txt" do

let url, count = result

One nice thing is that F#'s interactive window has a #time;; option which shows you wall-clock time and CPU time. Here is the result from running the code above on a 256meg file I concatenated together (I couldn't find the one Don was using):

Real: 00:00:06.899, CPU: 00:00:04.165, GC gen0: 416, gen1: 1, gen2: 0

It looks like the majority of the time is in CPU so there should be ample opportunity to parallelize. One thing to note: I think the interactive window is unoptimized--when I just compile and run the code, I get times in the sub 5-seconds range. My machine is a 4-way 2.4 GHz Core Duo.

This posting is provided "AS IS" with no warranties, and confers no rights.

 

Comments