Freigeben über


Detecting FTP Leeches with LogParser

Someone asked me an interesting question the other day, "How do I detect if any users are leeching my FTP site? " That's a great question, and it warrants some explanation and a little LogParser code.

First of all, I should explain the term leeching as it applies to FTP. If you host a public FTP site with a collection of files for downloading, a leech is someone that connects to your site and downloads everything - or almost everything. The term leech is most-often used in peer-to-peer (P2P) sites when someone downloads and never uploads, or as Wikipedia appropriately summarizes it, " leeching is taking without giving. "

This leads me back to the original question, which was how to detect if someone is leeching your FTP site. The basic pattern for leeching is usually pretty easy to detect – you'll see a large volume of change directory (CWD), directory listing (LIST), and file retrieval (RETR) requests; the pattern will usually be something like the following flow of events:

CWD / LIST / RETR1 / RETR2 / RETRn / CWD / LIST / RETR1 / RETR2 / RETRn / etc.

An excerpt from an IIS W3C log file might look something like the following, and you can easily see the client's activity as it traverses through the FTP site and grabs everything. (I highlighted the file download requests.)

date time c-ip s-ip s-port cs-method cs-uri-stem sc-status x-session
2010-09-02 21:12:02 192.168.0.1 192.168.0.20 21 HOST example.com 220 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:04 192.168.0.1 192.168.0.20 21 USER anonymous 331 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 21 PASS *** 230 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 21 SYST - 215 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 21 PWD - 257 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 64441 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 64441 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:08 192.168.0.1 192.168.0.20 21 LIST - 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:19 192.168.0.1 192.168.0.20 21 TYPE A 200 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:19 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:19 192.168.0.1 192.168.0.20 64443 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:19 192.168.0.1 192.168.0.20 64443 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:19 192.168.0.1 192.168.0.20 21 RETR file1.txt 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:22 192.168.0.1 192.168.0.20 21 TYPE A 200 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:22 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:22 192.168.0.1 192.168.0.20 64445 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:22 192.168.0.1 192.168.0.20 64445 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:22 192.168.0.1 192.168.0.20 21 RETR file2.txt 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:24 192.168.0.1 192.168.0.20 21 CWD /folder1 250 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:24 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:24 192.168.0.1 192.168.0.20 64447 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:24 192.168.0.1 192.168.0.20 64447 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:24 192.168.0.1 192.168.0.20 21 LIST - 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:26 192.168.0.1 192.168.0.20 21 TYPE A 200 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:26 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:26 192.168.0.1 192.168.0.20 64449 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:26 192.168.0.1 192.168.0.20 64449 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:26 192.168.0.1 192.168.0.20 21 RETR file3.txt 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:30 192.168.0.1 192.168.0.20 21 CWD /folder2 250 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:30 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:30 192.168.0.1 192.168.0.20 64451 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:30 192.168.0.1 192.168.0.20 64451 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:30 192.168.0.1 192.168.0.20 21 LIST - 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:31 192.168.0.1 192.168.0.20 21 TYPE I 200 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:31 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:31 192.168.0.1 192.168.0.20 64453 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:31 192.168.0.1 192.168.0.20 64453 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:31 192.168.0.1 192.168.0.20 21 RETR file4.txt 226 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:33 192.168.0.1 192.168.0.20 21 TYPE I 200 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:33 192.168.0.1 192.168.0.20 21 PASV - 227 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:33 192.168.0.1 192.168.0.20 64455 DataChannelOpened - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:33 192.168.0.1 192.168.0.20 64455 DataChannelClosed - - 4e616d65-6f6e-6d65-6973-526f62657274
2010-09-02 21:12:33 192.168.0.1 192.168.0.20 21 RETR file5.txt 226 4e616d65-6f6e-6d65-6973-526f62657274
etc. etc. etc. etc. etc. etc. etc. etc. etc.

Unless the client is scripting their leeching to hide their activity, the client requests will generally be in the same session, which makes it easier to track. Otherwise, you have to track activity through IP address, but it’s still doable.

I've written a lot of content about Microsoft's LogParser utility in the past, and so it should seem logical that I'd find a way to use it for this situation as well. If I were to write a LogParser script to detect leeches, I would probably start with something like the following example:

Logparser.exe "SELECT date,COUNT(*) as downloads,c-ip,x-session FROM *.log WHERE cs-method='RETR' GROUP BY date,c-ip,x-session HAVING COUNT(*) > 100" -i:W3C

An easier view of just the SQL syntax for that LogParser query looks like the following:

SELECT date,COUNT(*) AS downloads,c-ip,x-session
FROM *.log
WHERE cs-method='RETR'
GROUP BY date,c-ip,x-session
HAVING COUNT(*) > 100

In this example, the script is asking LogParser to query all of the FTP logs in a folder and return the date, download count, client IP address, and session ID for every session where the client downloaded more than 100 files in that day, and it's grouping records together based on the client IP address, session, and date. By way of explanation, 100 is a number that I chose somewhat arbitrarily, but I would think anyone with more downloads than 100 in a day would probably be a leech for most FTP sites. If you are writing your own LogParser script, you want to make sure that you group the query across dates like I did so people that download 100 files over a couple of years don’t show up as a leech. ;-]

When you run the above query, LogParser will give you output that looks like the following:

date downloads c-ip x-session
---------- --------- -------------- ------------------------------------
2010-03-10 313 192.168.0.1 d9ad5121-434a-4eae-a3ee-186733fd44f4
2010-06-17 2112 192.168.0.2 8e14bccd-0403-4a33-b33f-e0996c77c1a9

This enables you to see pretty quickly who are the top leeches for your FTP site, and then you can act accordingly. At the very least you might want to think about adding the IP addresses from any leeches to your FTP IP Address and Domain Restrictions.

I hope this helps. ;-]

Comments

  • Anonymous
    September 05, 2010
    Nice post always good too see more content about LogParser. Do you have anyway idea what the development situation at Microsoft is for Logparser, there hasn't been any word for a long time as far as i can tell. Are we ever getting a LogParser 3?

  • Anonymous
    September 07, 2010
    I would personally LOVE to see Microsoft release LogParser 3, but unfortunately there are no plans at this time for an update to LogParser in the near future. :-(