次の方法で共有


Index creation performance issues

In this post: Index creation performance question  I asked: Is it faster to fill a table with values, then index it or create the index on an empty table, then fill it? Why?

After a record in a cursor is modified, any associated index must be updated. If there is already an index, then after each record added, the index needs to be checked.

If the index is created on the filled table, then a single, specialized optimized index creation routine can be invoked within VFP to create the index.

For example, I profiled the code for creating 10000 records. If the special routine is invoked to create the index on these records,

If the index is created beforehand, then the routine to add a node (let’s call it AddNode) to the index is called 10001 times (the extra 1 time is for the index itself: multiple indices can be stored in an index: see What is an index anyway?)

If the index is created after, then the special routine is called, which reduces the number of calls to AddNode to 312!

On my machines, the index after scenario is about 3-5 times faster than the index before! What results do you get?

Next question: what effect will there be if you use random data rather than a constant string? You may be surprised by the results!

(Another question: Why does this code not use SYS(2015) - Unique Procedure Name to get random values?)

SET EXCLUSIVE ON

SET SAFETY OFF

num=100000

?"Create Index before = ",TestIt(.t.,num)

?"Create Index after = ",TestIt(.f.,num)

USE

n=ADIR(aa,"test.cdx")

?"Index size = ",aa[1,2]

PROCEDURE TestIt(fIndexBefore as Boolean, num as Integer)

          ns=SECONDS()

          CREATE table test (name c(100))

          IF fIndexBefore

                   INDEX on name TAG name

          ENDIF

          FOR i = 1 TO num

                   IF .t.

          INSERT INTO test VALUES (MakeRandString(7))

                   ELSE

                             INSERT INTO test VALUES ("testing")

                   ENDIF

          ENDFOR

          IF !fIndexBefore

                   INDEX on name TAG name

          ENDIF

          RETURN SECONDS()-ns

PROCEDURE MakeRandString(nLen as Integer)

          LOCAL i,cStr

          cStr=""

          FOR i = 1 TO nLen

                   cStr=cStr+CHR(RAND()*26+65)

          ENDFOR

          RETURN cStr

Comments

  • Anonymous
    February 01, 2006
    Interesting. Thanks for the explanation.
    As for the SYS(2015) question: each call to SYS(2015) produces a result that is distinct in the right-most character(s), the first character is always the same, and the generated values are all consecutive. In a quick test where SYS(2015) was called 10000 times, the first 7 characters of all 10000 values I got back were identical. I think that alone makes SYS(2015) a poor choice for a real-world test. However, there were a couple other differences. The SYS(2015) values can be indexed more efficiently than a more random set of values, making the timing results less applicable to a real-world scenario. I also noticed that the CDX size of the table stuffed with SYS(2015) values was considerably smaller than the CDX for a more random set of values of the same length.
  • Anonymous
    March 09, 2006
    On machine - P4 2.4 with 512Mb Ram
    i got 2.957 and 1.996 - 1.48 times faster
  • Anonymous
    December 01, 2006
    Does this apply to views as well?  I can't seem to see a performance difference.