IIS Log File Data Mining with Log Parser
IIS Log File Data Mining
Log Parser, now in version 2.2. LogParser can query log files in a multitude of ways. It can output the results to text files, SQL Server tables, or charts. It is indeed, a great way to analyze IIS logs! Even on busy web sites with multiple servers being hit by thousands of users, simple queries can produce results in a minute or less.
Step 1: Download the latest version from the Microsoft download center: https://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en. Follow the installation instructions.
If you have the authority to do so, under the IIS Logging options, ensure the following fields are selected in addition to the standard fields so you can get richer information:
Time Taken (time-taken)
Bytes Sent (sc-bytes)
Step 2: Cleanse the raw IIS log files. Most often, it is page requests that you are most interested in analyzing. Of course the log files contain all requests including scripts and images; therefore, I frequently pre-process the raw IIS logs to strip out the uninteresting requests. This will give me better results in much less execution time. Stripping the raw files can reduce the number of entries by 90%. I call this process cleansing the file.
Here is a sample query that will remove the “back ground noise” so the result is just page requests:
-- LogParser -i:IISW3C -O:TSV file:..\Queries\Cleanse.sql
SELECT
date,
time,
cs-method,
cs-uri-stem,
cs-uri-query,
sc-status,
sc-substatus,
sc-bytes,
time-taken
INTO Clean.TSV
FROM *.log
WHERE
EXTRACT_TOKEN([cs-uri-stem], -2, '.') NOT IN ('css'; 'gif'; 'swf'; 'js'; 'ico'; 'png';'jpg';'JPG';'asmx';'bmp')
AND
EXTRACT_TOKEN([cs-uri-stem], -1, '.') NOT IN ('css'; 'gif'; 'swf'; 'js'; 'ico'; 'png';'jpg';'JPG';'asmx';'bmp')
Step 3: Create the query file. LogParser is a command line tool. It uses a SQL-like query language. Although you can embed the SQL query in the command line, I recommend keeping the SQL in separate text files; and then reference the SQL files from the command line. This makes it very easy to tweak the SQL and reuse it on many machines.
Here is a simple query to get summary totals by status code. We read from the cleansed file created in the prior step (FROM Clean.TSV), and write to another tab separate file (INTO ".\CountByStatus.TSV"). We can then open the output file with Excel for further analysis and graphing.
-- LogParser -i:TSV -O:TSV
-- file:..\Queries\CountByStatus.sql
SELECT STRCAT(TO_STRING(sc-status), STRCAT('.', TO_STRING(sc-substatus))) AS Status,
COUNT(*) AS Total
INTO ".\CountByStatus.TSV"
FROM Clean.TSV
WHERE sc-status > 99
GROUP BY Status
ORDER BY Total DESC
This syntax should be easy to read since it is very close to database SQL syntax. It calculates the COUNT, grouping by status code.
Step 4: Execute the command and review the results. Here we see the impact of SharePoint NTLM authentication on traffic, by the large number of 401 status codes.
Status Total
200.0 855348
401.1 106159
401.2 86056
302.0 14874
304.0 10040
403.0 197
401.5 90
404.0 80
Step 5: A tab separated file is automatically recognized and transformed into a worksheet by Excel. Open the output file with Excel gives this result.
Area |
Category |
Level |
Count |
Search Server Common |
PHSts |
Monitorable |
1,321,494 |
Windows SharePoint Services |
Timer |
Monitorable |
110,492 |
Search Server Common |
FTEFiles |
High |
5,437 |
Windows SharePoint Services |
Topology |
High |
5,343 |
Search Server Common |
MS Search Administration |
High |
4,376 |
Search Server Common |
GatherStatus |
Monitorable |
3,299 |
SharePoint Portal Server |
Business Data |
Unexpected |
1,719 |
Office Server |
Setup and Upgrade |
High |
1,105 |
Search Server Common |
Common |
Monitorable |
1,086 |
ULS Logging |
Unified Logging Service |
Monitorable |
1,077 |
Here are some other query examples to get your creative thinking going.
· Summarize the page queries over time.
-- logparser -i:TSV -iTsFormat:"hh:mm:ss" -o:TSV
-- file:..\Queries\HitsOverTime.sql
SELECT
QUANTIZE(time,900) AS [15 Min Bucket],
COUNT(*) AS Hits,
DIV(COUNT(*), 900.0) AS RPS
INTO .\HitsOverTime.TSV
FROM clean.TSV
/*
WHERE
STRLEN(cs-uri-query) > 0
*/
GROUP BY QUANTIZE(time,900)
ORDER BY QUANTIZE(time,900)
15 Min Bucket Hits RPS
00:00:00 14344 15
00:15:00 27606 30
00:30:00 13840 15
00:45:00 15162 16
01:00:00 17056 18
01:15:00 15871 17
01:30:00 15941 17
01:45:00 15894 17
02:00:00 15072 16
02:15:00 14768 16
02:30:00 17775 19
02:45:00 18564 20
03:00:00 14971 16
03:15:00 16062 17
03:30:00 17299 19
03:45:00 17522 19
04:00:00 16966 18
04:15:00 18602 20
04:30:00 16571 18
· Get the 200 most popular pages along with the average time taken and byte size.
SELECT TOP 200
cs-uri-stem AS [Request URI],
COUNT(*) AS Hits,
Avg(time-taken) AS [Avg Time (ms)],
Avg(sc-bytes) AS [Avg Bytes]
Into ".\Top200Pages.TSV"
FROM clean.tsv
GROUP BY cs-uri-stem
ORDER BY Hits DESC