Udostępnij za pośrednictwem


LPS: Custom Field Extraction

I may have mentioned this before but I use it so often, I think it deserves it's own post. I am often faced (as in very often) with a single field of data that has lots of disparate bits of data embedded within it. A perfect example is the cs-uri-query field contained in IIS logs, especially where Exchange is concerned. Here is an example:

ae=Folder&t=IPF.Appointment&id=LgAAAACZfhLxcD%2bnRZZG0OyEvOwcAQC3gm%2fVeZsgSIJcyGKb0MppAEwcvx5xAAAC&yr=2015&mn=3&dy=15&Initial+Budget>>Conn:1,HangingConn:0,AD:60000/60000/0%,CAS:90000/85862/7%,AB:60000/60000/0%,RPC:90000/87538/5%,FC:$null/0,Policy:DefaultThrottlingPolicy_67df4d2e-9f19-4270-9b8e-dec43773104b,Norm&v=14.3.174.1&mbx=MBX1.CONTOSO.COM&sessionId=891d047b451141b0a6406ca854de41b9&prfltncy=134&prfrpccnt=15&prfrpcltncy=78&prfldpcnt=1&prfldpltncy=15&prfavlcnt=0&prfavlltncy=0&End+Budget>>Conn:1,HangingConn:0,AD:60000/60000/0%,CAS:90000/85738/8%,AB:60000/60000/0%,RPC:90000/87463/5%,FC:$null/0,Policy:DefaultThrottlingPolicy_67df4d2e-9f19-4270-9b8e-dec43773104b,Norm[Resources:(Mdb)ABC-DAG09-DB32(Health:-1%,HistLoad:0),]

^That is a lot of info for a single field, not to mention not very review-friendly, and for the sake of my theoretical needs I really just wanted the MBX being used, the session ID and which throttling policy is being applied. I have no real need for this data right now, I'm just randomly grabbing these for the example hence the term "theoretical need".

It is easy to extract each bit of information we want into it's own field. I can for example use EXTRACT_PREFIX() and EXTRACT_SUFFIX() to do this. We could probably use SUBSTR() but that would potentially be much messier and we'd be dealing with string lengths which is going to break the moment the string we are dealing with changes.

With EXTRACT, we can use strings and grab exactly what we want. There is a shortcut in LPS which auto-creates a near ready to go combo of the EXTRACT functions, that shortcut is SHIFT+F12 and inserts the following into the LPS query window (yes, this shortcut exists because I grew extremely tired of typing it out by hand!):

EXTRACT_PREFIX(EXTRACT_SUFFIX(FieldName, 0, 'StartChar'), 0, 'EndChar') as FieldAlias

We only need to change the tokens to fit our data. Here are those tokens and their meaning:

FieldName - The name of the field we want to extract data from (in this case cs-uri-query)

StartChar - The starting character(s) we want to use as a marker to tell LPS where our data begins.

EndChar - The ending character(s) we want to use as a marker to tell LPS where our data ends.

FieldAlias - What we want our custom column to be named.

Since I want three items, I need three of these lines, one for each bit of data I want to extract (E.g.: Mailbox Server, SessionID and Throttling Policy). I'll go ahead and fill these in and you can compare to the example querystring above to see why I chose what I chose:

SELECT TOP 1000

EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '&mbx='), 0, '&') as MBXServer,

EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '&sessionId='), 0, '&') as [Session ID],

EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, 'Policy:'), 0, ',') as Policy

FROM '[LOGFILEPATH]'

WHERE cs-uri-stem LIKE '%ev.owa%'

AND cs-username IS NOT NULL

*The WHERE statement above exists to make sure only records that contain the querystring data I expect are returned. If I did not do this, we'd get back some empty fields because the data we want to extract doesn't exist in every request in the IIS log. Here is an example result of the above:

  Mailbox Session ID Policy
  MBX2.CONTOSO.COM 8667ec4f36ao4660a3d8382de9dfbafe DefaultThrottlingPolicy_45df4d2e-9f19-4270-9b8e-deu22273104b
  MBX1.CONTOSO.COM 8667ec4f36ao4660a3d8382de9dfbafe DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b
  MBX3.CONTOSO.COM 86e44db93ec245e0b64f3feb2a68feeb DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b
  MBX1.CONTOSO.COM 86e44db93ec245e0b64f3feb2a68feeb DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b

Neat! Now this is something I can work with.

Being an Exchange Escalation Enginner I used Exchange data from my lab but this can be used with any type of data in LPS that has text, not just IIS logs. We could have just as easily used this on a CSV log or even text. Consider the following line of text and also consider it is in some text file somewhere:

foo=bar+bar=001,sometext||hello

We could easily use what we have learned to turn that into useful columns of data. In this example I'm using the TEXTLINELOG type to search my example text file:

SELECT TOP 1000

EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, 'foo='), 0, '+') as Foo,

EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, 'bar='), 0, ',') as Bar,

EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, '||'), 0, '+') as SomeText

FROM '[LOGFILEPATH]'

Result:

  Foo Bar SomeText
  bar 001 hello

When data mining all types of data with LPS, I find myself using this technique much of the time. It's just too handy to be able to extract data and view it as I need to see it which is the entire point.... Get to the data and represent it in a meaningful way. Just remember to use SHIFT+F12 to insert the template.

Comments

  • Anonymous
    March 04, 2015
    Thanks
    very interesting
  • Anonymous
    March 17, 2015
    Thanks Kerry for such a great tool. Is there a way to connect this tool with SQL server as source/input data?
  • Anonymous
    September 30, 2015
    Instead of using EXTRACT_PREFIX and EXTRACT_SUFFIX, wouldn't it be better to just use EXTRACT_VALUE? It seems to work better, especially when not all your data have the same order of arguments.
  • Anonymous
    October 30, 2015
    Hi George, it just depends on the data you wish to get at and how it is laid out. So you are correct, both of them can be quite handy.
  • Anonymous
    December 11, 2015
    At first I thought this article was going to be about how to handle querying iis log files that have custom fields in them from the new IIS 8.5 enhanced logging feature. I'm currently struggling with an issue where I cannot query an IISW3C log file due to a custom field existing in the data. Was curious how log parser studio is handling that? I posted a question on serverfault about that linked here:http://serverfault.com/questions/742524/logparser-error-when-using-iis-log-files-that-have-custom-fields
  • Anonymous
    March 24, 2016
    Hi Kary,I installed LPS2.2. I can run the existing query in Library for Message Tracking. But when I copy the existing query to a new query tab and run it, I got unknown field error. For example, Error parsing query: WHERE clause: Syntax Error: unknow field 'message-subject' [Record field does not exit].how can I run my own query?ThanksLynn
    • Anonymous
      July 15, 2016
      Hi Lylin,If you load up the Message Tracking query, it is preset to the right log type. If you create a new query tab, it is not so you'll just need to select the same log type. This is the "Log Type" drop down 'about' in the middle of the screen above the query and below the grid. That list will have the selections you need to change the query to the correct log type.Lastly, if you just want to make a copy of an existing query, no need to copy/pasted, just right-click the query tab and choosing "Duplicate Tab" and it will make a exact copy for you which you. :)