Jaa


PDF iFilter Battle, second round

If you still remember the last round of our PDF iFilter battle, FoxIT won it. Now in this round, we bring in another challenger: TET PDF iFIlter. It is also avaliable on x86 and x64, free for non-commercial desktop use, will need a license for Server installation.

So here's the new result for file set II:

 

File Number

Total File Size(MB)

Avg File Size(MB)

Crawl Time(m:s)

Crawl Time(s)

File Per Second

Success

Error

FoxIT

2676

2406

0.90

7:46

466

5.74

2759

0

Adobe

2676

2406

0.90

40:58

2458

1.09

2757

2

TET

2676

2406

0.90

13:48

828

3.23

2752

0

 

I also obtained an archive copy from People's Daily, from 2001 to 2006. ~20,000 PDF files, 13.4GB total. Tested on a 8 cores XEON box.

 

 

File Number

Total File Size(MB)

Avg File Size(MB)

Crawl Time(h:m:s)

Crawl Time(s)

File Per Second

Success

Error

FoxIT

19890

13793

0.69

00:30:53

1853

10.73

19884

7

Adobe

19890

13793

0.69

05:19:04

19144

1.03

19887

4

TET

19890

13793

0.69

01:40:09

6009

3.31

19879

12

 

And licensing comparsion for production(USD):

  Desktop Server 1-2 CoresPer Server 4 CoresPer Server 8+ Cores Per Server
Adobe Free Free Free Free Free
Foxit Free Not Free 329.99 589.97 1109.93
TET $119 for commercial usage Not Free 595 595 595

 

Summary

It is good to see another vendor joined this market. TET showed good performance, although still behind Foxit. But it's licensed based on servers not cores, the cost would be lower than Foxit if you have a typical 2 way quad cores box.

Comments

  • Anonymous
    March 09, 2009
    PingBack from http://www.clickandsolve.com/?p=20110

  • Anonymous
    July 31, 2009
    Great post. What are the errors that were encountered? FoxIT shows 7, Adobe shows 4, and TET shows 12. Are they true errors or are they notices for items that are correctly not crawled, such as expired items, items marked as not to be crawled, password protected, etc...? I think this would be a large factor when considering which iFilter to use.  One may consider a slower rate of indexing to be acceptable if a larger percentage of the corpus will be properly indexed.

  • Anonymous
    December 01, 2010
    cy21 raises a good point, I'm currently looking at which is best, I think adobe fell off the short list pretty quickly, but TET might be worth considering if there is a trade-off between quality and speed, the numbers in this post seem to show that Foxxit is slightly more reliable in addition to being much faster than TET, is this a logical conclusion or are there other factors involved in the reliability of ifilters? (you seem to be following this technology very closely and I had not known ifilters existed until today)