PDF iFilter Battle, second round
If you still remember the last round of our PDF iFilter battle, FoxIT won it. Now in this round, we bring in another challenger: TET PDF iFIlter. It is also avaliable on x86 and x64, free for non-commercial desktop use, will need a license for Server installation.
So here's the new result for file set II:
File Number |
Total File Size(MB) |
Avg File Size(MB) |
Crawl Time(m:s) |
Crawl Time(s) |
File Per Second |
Success |
Error |
|
FoxIT |
2676 |
2406 |
0.90 |
7:46 |
466 |
5.74 |
2759 |
0 |
Adobe |
2676 |
2406 |
0.90 |
40:58 |
2458 |
1.09 |
2757 |
2 |
TET |
2676 |
2406 |
0.90 |
13:48 |
828 |
3.23 |
2752 |
0 |
I also obtained an archive copy from People's Daily, from 2001 to 2006. ~20,000 PDF files, 13.4GB total. Tested on a 8 cores XEON box.
File Number |
Total File Size(MB) |
Avg File Size(MB) |
Crawl Time(h:m:s) |
Crawl Time(s) |
File Per Second |
Success |
Error |
|
FoxIT |
19890 |
13793 |
0.69 |
00:30:53 |
1853 |
10.73 |
19884 |
7 |
Adobe |
19890 |
13793 |
0.69 |
05:19:04 |
19144 |
1.03 |
19887 |
4 |
TET |
19890 |
13793 |
0.69 |
01:40:09 |
6009 |
3.31 |
19879 |
12 |
And licensing comparsion for production(USD):
Desktop | Server | 1-2 CoresPer Server | 4 CoresPer Server | 8+ Cores Per Server | |
Adobe | Free | Free | Free | Free | Free |
Foxit | Free | Not Free | 329.99 | 589.97 | 1109.93 |
TET | $119 for commercial usage | Not Free | 595 | 595 | 595 |
Summary
It is good to see another vendor joined this market. TET showed good performance, although still behind Foxit. But it's licensed based on servers not cores, the cost would be lower than Foxit if you have a typical 2 way quad cores box.
Comments
Anonymous
March 09, 2009
PingBack from http://www.clickandsolve.com/?p=20110Anonymous
July 31, 2009
Great post. What are the errors that were encountered? FoxIT shows 7, Adobe shows 4, and TET shows 12. Are they true errors or are they notices for items that are correctly not crawled, such as expired items, items marked as not to be crawled, password protected, etc...? I think this would be a large factor when considering which iFilter to use. One may consider a slower rate of indexing to be acceptable if a larger percentage of the corpus will be properly indexed.Anonymous
December 01, 2010
cy21 raises a good point, I'm currently looking at which is best, I think adobe fell off the short list pretty quickly, but TET might be worth considering if there is a trade-off between quality and speed, the numbers in this post seem to show that Foxxit is slightly more reliable in addition to being much faster than TET, is this a logical conclusion or are there other factors involved in the reliability of ifilters? (you seem to be following this technology very closely and I had not known ifilters existed until today)