What about index key distribution?
Mike Potjer wins the prize! His explanation of the non-random nature of SYS(2015) is why it’s not a good source for a random string.
His observations about index size are also germane. The more random the index keys, the less it can be compressed, the more disk and memory is required, and the slower the performance.
We can vary the index file size by using this line:
INSERT INTO test VALUES (REPLICATE("a",90)+MakeRandString(7))
This will create data that’s the same for the first 90 characters. Observe the index length and how it changes with the index key.
Run the code below. Copy the code to a PRG, then paste the code from Excel's new gradient Data Bar at the bottom. Because the resulting file sizes have such a large range, change the maxsize of the databar rectangle by replacing the “25” to “55” in the line
this.oRect.w = CAST(nWidth * VAL(oCell.accValue) / nMaxVal - 25 as integer)
The code creates 3 graphs, using 3 different index keys. The databar shows the index file size on disk. The nLen variable is not used in the first graph, so it shows all the same values.
This line of code defines the various index keys used:
cIndexKeys=["testing"|MakeRandString(nLen)|REPLICATE("a",nLen-7)+MakeRandString(7)|MakeRandString(7)+REPLICATE("a",nLen-7)]
Note that VFP allows single quotes, double quotes, and square brackets as string delimiters, making it easy to quote executable code.
Using GetWordNum with a custom delimiter makes it easy to extract values from a string.
See also Using the Databar feature with real data
CLEAR ALL
CLEAR
SET EXCLUSIVE ON
SET SAFETY OFF
NumRecs=10000
cIndexKeys=["testing"|MakeRandString(nLen)|REPLICATE("a",nLen-7)+MakeRandString(7)]
nIndexKeyTypes = OCCURS("|",cIndexKeys)+1
PUBLIC oForm[nIndexKeyTypes]
LOCAL i,cKey,nMode
FOR nMode=1 TO nIndexKeyTypes
CREATE TABLE ("results"+trans(nMode)) (nLen i, IndexSize i)
cKey=GETWORDNUM(cIndexKeys,nMode,'|')
FOR i = 10 TO 100 STEP 10
INSERT INTO ("results"+trans(nMode)) VALUES (i,TestIt(i,NumRecs,cKey))
?i,Indexsize
ENDFOR
SELECT ("results"+trans(nMode))
LOCATE
oForm[nMode]=CREATEOBJECT("myform",PROPER(FIELD(2)))
WITH oForm[nMode] as Form
.Height=240
.Top = (nMode-1) * 240
.Caption = cKey
.Show()
ENDWITH
ENDFOR
PROCEDURE TestIt(nLen as Integer, NumRecs as Integer,cKey as String)
LOCAL j,aa[1]
CREATE table test (name c(100))
FOR j = 1 TO NumRecs
INSERT INTO test VALUES (EVALUATE(cKey))
ENDFOR
INDEX on name TAG name
USE && close file so it's written to disk
ADIR(aa,"test.cdx")
RETURN aa[1,2]
PROCEDURE MakeRandString(nLen as Integer)
LOCAL i,cStr
cStr=""
FOR i = 1 TO nLen
cStr=cStr+CHR(RAND()*26+65)
ENDFOR
RETURN cStr
*1289 1774
Comments
- Anonymous
February 05, 2006
Hi Calvin,
Great application of both the index size and the gradient bar code.
Btw: I think that the rect.width calculation should be changed to
"this.oRect.w = CAST((nWidth - 55) * VAL(oCell.accValue) / nMaxVal as integer)". Reserve space to show the numbers before multiplying it with the relative bar width for current bar. - Anonymous
January 22, 2008
PingBack from http://randomquote.247blogging.info/calvin-hsias-weblog-what-about-index-key-distribution/