Jaa


IIS Log File Data Mining with Log Parser

IIS Log File Data Mining

Log Parser, now in version 2.2. LogParser can query log files in a multitude of ways. It can output the results to text files, SQL Server tables, or charts. It is indeed, a great way to analyze IIS logs! Even on busy web sites with multiple servers being hit by thousands of users, simple queries can produce results in a minute or less.

Step 1: Download the latest version from the Microsoft download center: https://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en. Follow the installation instructions.

If you have the authority to do so, under the IIS Logging options, ensure the following fields are selected in addition to the standard fields so you can get richer information:

Time Taken (time-taken)

Bytes Sent (sc-bytes)

Step 2: Cleanse the raw IIS log files. Most often, it is page requests that you are most interested in analyzing. Of course the log files contain all requests including scripts and images; therefore, I frequently pre-process the raw IIS logs to strip out the uninteresting requests. This will give me better results in much less execution time. Stripping the raw files can reduce the number of entries by 90%. I call this process cleansing the file.

Here is a sample query that will remove the “back ground noise” so the result is just page requests:

-- LogParser -i:IISW3C -O:TSV file:..\Queries\Cleanse.sql

SELECT

   date,

   time,

   cs-method,

   cs-uri-stem,

   cs-uri-query,

   sc-status,

   sc-substatus,

   sc-bytes,

   time-taken

INTO Clean.TSV

FROM *.log

WHERE

   EXTRACT_TOKEN([cs-uri-stem], -2, '.') NOT IN ('css'; 'gif'; 'swf'; 'js'; 'ico'; 'png';'jpg';'JPG';'asmx';'bmp')

   AND

   EXTRACT_TOKEN([cs-uri-stem], -1, '.') NOT IN ('css'; 'gif'; 'swf'; 'js'; 'ico'; 'png';'jpg';'JPG';'asmx';'bmp')

Step 3: Create the query file. LogParser is a command line tool. It uses a SQL-like query language. Although you can embed the SQL query in the command line, I recommend keeping the SQL in separate text files; and then reference the SQL files from the command line. This makes it very easy to tweak the SQL and reuse it on many machines.

Here is a simple query to get summary totals by status code. We read from the cleansed file created in the prior step (FROM Clean.TSV), and write to another tab separate file (INTO ".\CountByStatus.TSV"). We can then open the output file with Excel for further analysis and graphing.

-- LogParser -i:TSV -O:TSV

-- file:..\Queries\CountByStatus.sql

SELECT STRCAT(TO_STRING(sc-status), STRCAT('.', TO_STRING(sc-substatus))) AS Status,

   COUNT(*) AS Total

INTO ".\CountByStatus.TSV"

FROM Clean.TSV

WHERE sc-status > 99

GROUP BY Status

ORDER BY Total DESC

 

This syntax should be easy to read since it is very close to database SQL syntax. It calculates the COUNT, grouping by status code.

Step 4: Execute the command and review the results. Here we see the impact of SharePoint NTLM authentication on traffic, by the large number of 401 status codes.

Status Total

200.0 855348

401.1 106159

401.2 86056

302.0 14874

304.0 10040

403.0 197

401.5 90

404.0 80

Step 5: A tab separated file is automatically recognized and transformed into a worksheet by Excel. Open the output file with Excel gives this result.

Area

Category

Level

Count

Search Server Common

PHSts

Monitorable

1,321,494

Windows SharePoint Services

Timer

Monitorable

110,492

Search Server Common

FTEFiles

High

5,437

Windows SharePoint Services

Topology

High

5,343

Search Server Common

MS Search Administration

High

4,376

Search Server Common

GatherStatus

Monitorable

3,299

SharePoint Portal Server

Business Data

Unexpected

1,719

Office Server

Setup and Upgrade

High

1,105

Search Server Common

Common

Monitorable

1,086

ULS Logging

Unified Logging Service

Monitorable

1,077

 

Here are some other query examples to get your creative thinking going.

· Summarize the page queries over time.

-- logparser -i:TSV -iTsFormat:"hh:mm:ss" -o:TSV

-- file:..\Queries\HitsOverTime.sql

SELECT

   QUANTIZE(time,900) AS [15 Min Bucket],

   COUNT(*) AS Hits,

   DIV(COUNT(*), 900.0) AS RPS

INTO .\HitsOverTime.TSV

FROM clean.TSV

/*

WHERE

   STRLEN(cs-uri-query) > 0

   */

GROUP BY QUANTIZE(time,900)

ORDER BY QUANTIZE(time,900)

 

15 Min Bucket Hits RPS

00:00:00 14344 15

00:15:00 27606 30

00:30:00 13840 15

00:45:00 15162 16

01:00:00 17056 18

01:15:00 15871 17

01:30:00 15941 17

01:45:00 15894 17

02:00:00 15072 16

02:15:00 14768 16

02:30:00 17775 19

02:45:00 18564 20

03:00:00 14971 16

03:15:00 16062 17

03:30:00 17299 19

03:45:00 17522 19

04:00:00 16966 18

04:15:00 18602 20

04:30:00 16571 18

· Get the 200 most popular pages along with the average time taken and byte size.

SELECT TOP 200

   cs-uri-stem AS [Request URI],

   COUNT(*) AS Hits,

   Avg(time-taken) AS [Avg Time (ms)],

   Avg(sc-bytes) AS [Avg Bytes]

Into ".\Top200Pages.TSV"

FROM clean.tsv

GROUP BY cs-uri-stem

ORDER BY Hits DESC