Detecting FTP Leeches with LogParser
Someone asked me an interesting question the other day, "How do I detect if any users are leeching my FTP site? " That's a great question, and it warrants some explanation and a little LogParser code.
First of all, I should explain the term leeching as it applies to FTP. If you host a public FTP site with a collection of files for downloading, a leech is someone that connects to your site and downloads everything - or almost everything. The term leech is most-often used in peer-to-peer (P2P) sites when someone downloads and never uploads, or as Wikipedia appropriately summarizes it, " leeching is taking without giving. "
This leads me back to the original question, which was how to detect if someone is leeching your FTP site. The basic pattern for leeching is usually pretty easy to detect – you'll see a large volume of change directory (CWD), directory listing (LIST), and file retrieval (RETR) requests; the pattern will usually be something like the following flow of events:
CWD / LIST / RETR1 / RETR2 / RETRn / CWD / LIST / RETR1 / RETR2 / RETRn / etc.
An excerpt from an IIS W3C log file might look something like the following, and you can easily see the client's activity as it traverses through the FTP site and grabs everything. (I highlighted the file download requests.)
date | time | c-ip | s-ip | s-port | cs-method | cs-uri-stem | sc-status | x-session |
---|---|---|---|---|---|---|---|---|
2010-09-02 | 21:12:02 | 192.168.0.1 | 192.168.0.20 | 21 | HOST | example.com | 220 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:04 | 192.168.0.1 | 192.168.0.20 | 21 | USER | anonymous | 331 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 21 | PASS | *** | 230 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 21 | SYST | - | 215 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 21 | PWD | - | 257 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 64441 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 64441 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:08 | 192.168.0.1 | 192.168.0.20 | 21 | LIST | - | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:19 | 192.168.0.1 | 192.168.0.20 | 21 | TYPE | A | 200 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:19 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:19 | 192.168.0.1 | 192.168.0.20 | 64443 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:19 | 192.168.0.1 | 192.168.0.20 | 64443 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:19 | 192.168.0.1 | 192.168.0.20 | 21 | RETR | file1.txt | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:22 | 192.168.0.1 | 192.168.0.20 | 21 | TYPE | A | 200 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:22 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:22 | 192.168.0.1 | 192.168.0.20 | 64445 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:22 | 192.168.0.1 | 192.168.0.20 | 64445 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:22 | 192.168.0.1 | 192.168.0.20 | 21 | RETR | file2.txt | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:24 | 192.168.0.1 | 192.168.0.20 | 21 | CWD | /folder1 | 250 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:24 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:24 | 192.168.0.1 | 192.168.0.20 | 64447 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:24 | 192.168.0.1 | 192.168.0.20 | 64447 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:24 | 192.168.0.1 | 192.168.0.20 | 21 | LIST | - | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:26 | 192.168.0.1 | 192.168.0.20 | 21 | TYPE | A | 200 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:26 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:26 | 192.168.0.1 | 192.168.0.20 | 64449 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:26 | 192.168.0.1 | 192.168.0.20 | 64449 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:26 | 192.168.0.1 | 192.168.0.20 | 21 | RETR | file3.txt | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:30 | 192.168.0.1 | 192.168.0.20 | 21 | CWD | /folder2 | 250 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:30 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:30 | 192.168.0.1 | 192.168.0.20 | 64451 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:30 | 192.168.0.1 | 192.168.0.20 | 64451 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:30 | 192.168.0.1 | 192.168.0.20 | 21 | LIST | - | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:31 | 192.168.0.1 | 192.168.0.20 | 21 | TYPE | I | 200 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:31 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:31 | 192.168.0.1 | 192.168.0.20 | 64453 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:31 | 192.168.0.1 | 192.168.0.20 | 64453 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:31 | 192.168.0.1 | 192.168.0.20 | 21 | RETR | file4.txt | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:33 | 192.168.0.1 | 192.168.0.20 | 21 | TYPE | I | 200 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:33 | 192.168.0.1 | 192.168.0.20 | 21 | PASV | - | 227 | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:33 | 192.168.0.1 | 192.168.0.20 | 64455 | DataChannelOpened | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:33 | 192.168.0.1 | 192.168.0.20 | 64455 | DataChannelClosed | - | - | 4e616d65-6f6e-6d65-6973-526f62657274 |
2010-09-02 | 21:12:33 | 192.168.0.1 | 192.168.0.20 | 21 | RETR | file5.txt | 226 | 4e616d65-6f6e-6d65-6973-526f62657274 |
etc. | etc. | etc. | etc. | etc. | etc. | etc. | etc. | etc. |
Unless the client is scripting their leeching to hide their activity, the client requests will generally be in the same session, which makes it easier to track. Otherwise, you have to track activity through IP address, but it’s still doable.
I've written a lot of content about Microsoft's LogParser utility in the past, and so it should seem logical that I'd find a way to use it for this situation as well. If I were to write a LogParser script to detect leeches, I would probably start with something like the following example:
Logparser.exe "SELECT date,COUNT(*) as downloads,c-ip,x-session FROM *.log WHERE cs-method='RETR' GROUP BY date,c-ip,x-session HAVING COUNT(*) > 100" -i:W3C
An easier view of just the SQL syntax for that LogParser query looks like the following:
SELECT date,COUNT(*) AS downloads,c-ip,x-session
FROM *.log
WHERE cs-method='RETR'
GROUP BY date,c-ip,x-session
HAVING COUNT(*) > 100
In this example, the script is asking LogParser to query all of the FTP logs in a folder and return the date, download count, client IP address, and session ID for every session where the client downloaded more than 100 files in that day, and it's grouping records together based on the client IP address, session, and date. By way of explanation, 100 is a number that I chose somewhat arbitrarily, but I would think anyone with more downloads than 100 in a day would probably be a leech for most FTP sites. If you are writing your own LogParser script, you want to make sure that you group the query across dates like I did so people that download 100 files over a couple of years don’t show up as a leech. ;-]
When you run the above query, LogParser will give you output that looks like the following:
date | downloads | c-ip | x-session |
---|---|---|---|
---------- | --------- | -------------- | ------------------------------------ |
2010-03-10 | 313 | 192.168.0.1 | d9ad5121-434a-4eae-a3ee-186733fd44f4 |
2010-06-17 | 2112 | 192.168.0.2 | 8e14bccd-0403-4a33-b33f-e0996c77c1a9 |
This enables you to see pretty quickly who are the top leeches for your FTP site, and then you can act accordingly. At the very least you might want to think about adding the IP addresses from any leeches to your FTP IP Address and Domain Restrictions.
I hope this helps. ;-]
Comments
Anonymous
September 05, 2010
Nice post always good too see more content about LogParser. Do you have anyway idea what the development situation at Microsoft is for Logparser, there hasn't been any word for a long time as far as i can tell. Are we ever getting a LogParser 3?Anonymous
September 07, 2010
I would personally LOVE to see Microsoft release LogParser 3, but unfortunately there are no plans at this time for an update to LogParser in the near future. :-(