Partager via


What about index key distribution?

Mike Potjer wins the prize! His explanation of the non-random nature of SYS(2015) is why it’s not a good source for a random string.

His observations about index size are also germane. The more random the index keys, the less it can be compressed, the more disk and memory is required, and the slower the performance.

We can vary the index file size by using this line:

          INSERT INTO test VALUES (REPLICATE("a",90)+MakeRandString(7))

This will create data that’s the same for the first 90 characters. Observe the index length and how it changes with the index key.

Run the code below. Copy the code to a PRG, then paste the code from Excel's new gradient Data Bar at the bottom. Because the resulting file sizes have such a large range, change the maxsize of the databar rectangle by replacing the “25” to “55” in the line

                             this.oRect.w = CAST(nWidth * VAL(oCell.accValue) / nMaxVal - 25 as integer)

The code creates 3 graphs, using 3 different index keys. The databar shows the index file size on disk. The nLen variable is not used in the first graph, so it shows all the same values.

This line of code defines the various index keys used:

cIndexKeys=["testing"|MakeRandString(nLen)|REPLICATE("a",nLen-7)+MakeRandString(7)|MakeRandString(7)+REPLICATE("a",nLen-7)]

Note that VFP allows single quotes, double quotes, and square brackets as string delimiters, making it easy to quote executable code.

Using GetWordNum with a custom delimiter makes it easy to extract values from a string.

See also Using the Databar feature with real data

CLEAR ALL

CLEAR

SET EXCLUSIVE ON

SET SAFETY OFF

NumRecs=10000

cIndexKeys=["testing"|MakeRandString(nLen)|REPLICATE("a",nLen-7)+MakeRandString(7)]

nIndexKeyTypes = OCCURS("|",cIndexKeys)+1

PUBLIC oForm[nIndexKeyTypes]

LOCAL i,cKey,nMode

FOR nMode=1 TO nIndexKeyTypes

          CREATE TABLE ("results"+trans(nMode)) (nLen i, IndexSize i)

          cKey=GETWORDNUM(cIndexKeys,nMode,'|')

          FOR i = 10 TO 100 STEP 10

                   INSERT INTO ("results"+trans(nMode)) VALUES (i,TestIt(i,NumRecs,cKey))

                   ?i,Indexsize

          ENDFOR

          SELECT ("results"+trans(nMode))

          LOCATE

          oForm[nMode]=CREATEOBJECT("myform",PROPER(FIELD(2)))

          WITH oForm[nMode] as Form

                   .Height=240

                   .Top = (nMode-1) * 240

                   .Caption = cKey

                   .Show()

          ENDWITH

ENDFOR

PROCEDURE TestIt(nLen as Integer, NumRecs as Integer,cKey as String)

          LOCAL j,aa[1]

          CREATE table test (name c(100))

          FOR j = 1 TO NumRecs

                   INSERT INTO test VALUES (EVALUATE(cKey))

          ENDFOR

          INDEX on name TAG name

          USE && close file so it's written to disk

          ADIR(aa,"test.cdx")

          RETURN aa[1,2]

PROCEDURE MakeRandString(nLen as Integer)

          LOCAL i,cStr

          cStr=""

          FOR i = 1 TO nLen

                   cStr=cStr+CHR(RAND()*26+65)

          ENDFOR

          RETURN cStr

*1289 1774

Comments

  • Anonymous
    February 05, 2006
    Hi Calvin,
    Great application of both the index size and the gradient bar code.
    Btw: I think that the rect.width calculation should be changed to
    "this.oRect.w = CAST((nWidth - 55) * VAL(oCell.accValue) / nMaxVal as integer)". Reserve space to show the numbers before multiplying it with the relative bar width for current bar.
  • Anonymous
    January 22, 2008
    PingBack from http://randomquote.247blogging.info/calvin-hsias-weblog-what-about-index-key-distribution/