Fun with Perl on my blogroll

Well, I decided it was time to export my blogroll to my blog.  Fortunately, NewsGator
exports in OPML format; the same format BlogX expects in its blogroll file. 
Unfortunately, the text that NewsGator spits out is too big (!) to fit in the textbox
that the WinBlogX client provides.  Mh… looking at the file, I see there
is an unnecessary “description” attribute – lets get rid of it. 
For work-related reasons I’ve had Perl on my mind recently, so with some trial-and-error,
and reference to web sites, I come up with a series of prototypes eventually culminating
in this successful attempt at regular expression replacement:

perl –pe  “s/ description=\”.*?\”//g;” blogroll.opml
> blogroll2.opml

Ok, that works, and strips the description.  I also notice both xmlurl and htmlurl;
surely one of them is superfluous.  Let’s strip off the htmlurl and see
how it goes:

perl –pe  “s/ htmlurl=\”.*?\”//g;” blogroll2.opml
> blogroll3.opml

I vaguely recall reading somewhere that BlogX expects xmlUrl, not xmlurl, so I do
a quick replacement on that, too:

perl –pe  “s/xmlurl/xmlUrl/g;” blogroll3.opml > blogroll4.opml

Ok, looks good!  Lets copy it into my blog… done.

Ahh… pretty text… but…. Uh, oh.  Can’t click on the
links.  Guess I made the wrong guess, between xmlurl and htmlurl.  Lets
go back and try again:

perl –pe  “s/ xmlurl=\”.*?\”//g;” blogroll2.opml
> blogroll3.opml

perl –pe  “s/htmlurl/htmlUrl/g;” blogroll3.opml > blogroll4.opml

Copy it over, and viola!  One fine-looking blogroll on my home page.  (With
clickable links.)  I notice a lot of duplicate entries for “WebLogs @ ASP.NET”,
but I think that will be a problem for another day.

I’m sure my fellow geeks out there can show me dozens of better ways to do this
– surely there is an awk version that will wash the dishes, too.  Go ahead,
give it your best shot!

Comments