zero or one based collection?
moo asks:
Zero based collections or 1 based?
Since programming languages are a bridge between the human concept of a solution and we naturally think the first element is in position 1, why was this not so on the actual language? Why are we made to think like a machine when infact we are not? The infamous "Off by one bug" is there because of the inherant design. So popular its even got a name yet we do nothing to prevent this happening at the language level, as for those who say its easier for the computer, thats why we have smart compilers. Im not a compiler dammit.
****
One of my hobbies is rewriting lines in TV and films, and I am therefore impelled to comment that moo's last line should be:
I'm a programmer, not a compiler, Jim
To cover this subject, I'm going to have to set the way-back-machine for the 1980s or earlier, back when real men wrote in assembly and our bytes only had 7 bits. (aside - How many of you - and be honest here - have ever heard of machines where the word size wasn't a multiple of 8? They really did exist).
Anyway, in those ancient times, processors were fairly glacial in their arithmetic speed, though much faster than the early calculators. Saving cycles was very important, so when arrays were first considered, the implementers looked at the code they wrote:
a[x]
translates to
address = base_address + (x - 1) * sizeof(x)
They actually didn't write it that way because they didn't have multiplication in those days, but that gives you the idea. Then somebody noticed that if your array starts with zero, you could write it as:
address = base_address + x * sizeof(x)
Thereby saving you a single decrement operation, which was important in those day.
Therefore early programmers got used to zero-based arrays, and the path was set, and it has stayed that way for many years for the majority of languages.
But why? Isn't it simple enough to change?
It's obviously trivially easy to change, and Moore's law has made the efficiency inconsequential in the majority of scenarios. The issue isn't around technological limitations, but rather human ones.
Understanding how zero-based indexing works is the secret handshake of the programming world. We all started not knowing the secret handshake, but over time we learned and even began to like the secret handshake, and now we don't know any other way to shake hands.
We're not going to try to change our brain wiring just because some young whippersnapper is having trouble remembering that the first index is zero.
Or, to put it another way, developers have a huge investment in hardwired things like this, and changing them will not make your customer happy.
[Update: Jack wrote:
Why cant we either have 1 based or user definable array bounds?
The CLR does support this kind of construct (I had a hard time not using the term “travesty“ here...), but there aren't, to my knowledge, any languages that have built-in syntax to do that.
Which is a very good thing. If you go down that route, rather than having to remember a single rule (C# arrays are zero-based), you have to remember that every array could be either zero-based, so all your loops become:
for (int index = arr.GetLowerBound(); index <= arr.GetUpperBound(); index++)
{
}
If you get this wrong, you get code that works fine for your test cases, but breaks for the people who like 3-based arrays.
Yuck. Many times, “make it an option“ is the worst choice.
]
<Update>
Ferris adds:
Why bother creating a new language if you are still hugging your legacy pillow at night. Waste of time.
One of our main goals was to create a language that was comfortable to C/C++ programmers. We could have taken a different tack, and designed a new language from scratch, and perhaps done some new and exciting things.
But if you look at all the languages out there, you'll find that having a comfortable syntax is well correlated with language success. Even if you've never written C# code before, if you have experience with a C-style language, you'll be able to read C# code.
</Update>
You can find another discussion of this issue here.
Comments
- Anonymous
March 16, 2004
<i>How many of you - and be honest here - have ever heard of machines where the word size wasn't a multiple of 8?</i>
Many of the machines in DECs PDP family had 36 bit words. Is there a prize for this? ;) - Anonymous
March 16, 2004
It should come as no surprise to anyone that Perl lets you define your own array base on a dynamic basis by setting the special $[ variable. It's, um, not recommended, though. - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
The first machine I ever did significant programming on was a Decsystem-20 - 36 bit words, segmented into 2 18 bit halfwords.
Stop by my office sometime and I'll show you the reference manual :) - Anonymous
March 16, 2004
The processor used in the Chromatic Mpact of a few years ago used 9-bit bytes. It was great for the days when others had 16-bit color (565) and we had 18 bit color (666). Writing to it from the PC host was painful but hey... - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
Maybe we need a new designer on the C# team with NEW ideas and concepts and one that isnt afraid of change. - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
Forcing base 1 is a good move. - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
I have a feeling lots of code is going to be ported from C/C++ to C#. If arrays were 1 based in C#, it would lead to tons of errors.
Also, I think the 0 based array thing made a lot of sense with the tight relationship between pointers and arrays in C. It would be really weird if p[n] = *(p+n-1). - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
VB used to support the dynamic lower bounds. But they removed it in VB.NET. There was also support for a 0 based or 1 based array using the Option Base statement. That did give you a lot of flexibility... - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
Eric: The CLR does support this kind of construct (I had a hard time not using the term “travesty“ here...), but there aren't, to my knowledge, any languages that have built-in syntax to do that.
Ada has that built-in..and more...
type SomeTypeName is array(4..10) of Integer;
or, for truth tables
type TruthTable is array(Boolean, Boolean, Boolean) of SomeResultType;
(I have a love-hate relationship with Ada - it does have some nice features, like named associations, but I'd rather use C++ :-)
Also on starting numbering at zero - Dijkstra's argumnet for numbering from zero (see http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) is of some interest. - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
The comment has been removed - Anonymous
March 16, 2004
I have a feeling lots of code is going to be ported from C/C++ to C#. If arrays were 1 based in C#, it would lead to tons of errors.
Also, I think the 0 based array thing made a lot of sense with the tight relationship between pointers and arrays in C. It would be really weird if p[n] = *(p+n-1).
well, if you are copy n paste porting, I have only one thing to say; ROFLMAO. - Anonymous
March 16, 2004
There already was a heated discussion on the subj here: http://discuss.fogcreek.com/newyork/default.asp?cmd=show&ixPost=1822&ixReplies=8
I personally find it extremely painful to use APIs that mix 1 and 0-based collections/arrays in VB. - Anonymous
March 16, 2004
That code for looping through an array looks like all of the VB code I use for arrays that I don't have control over. I never know what the bounds are, and have to code around it.
Travesty is right. - Anonymous
March 16, 2004
Your argument doesn't hold water !
The expression:
address = base_address + (x - 1) * sizeof(x)
can be re-written as:
address = (base_address - sizeof(x)) + x * sizeof(x)
The expression (base_address - sizeof(x)) is a constant that can be determined at compile time, so there is no need for a run time decrement !
The real reason for zero based collections goes back to the original design of the "C" programming language, in which the designers wanted array operations and pointer arithmetic to be equivalent, so that a[x] would be the same as writing *(a + x). Since then, all languages that followed in the tradition of "C" have used zero based indicies: C++, Java, and now C#. - Anonymous
March 17, 2004
One we lost. - Anonymous
March 17, 2004
I was a systems programmer on the Control Data 6000 series. 60 bit words (12 bits in Peripheral Processors) - Anonymous
March 17, 2004
<RE>Your argument doesn't hold water !
The expression:
address = base_address + (x - 1) * sizeof(x)
can be re-written as:
address = (base_address - sizeof(x)) + x * sizeof(x)
The expression (base_address - sizeof(x)) is a constant that can be determined at compile time, so there is no need for a run time decrement !</RE>
Eric was talking about a time before C, a time before optimizing compilers. If you re-read what he wrote, he was actually talking about a time before COMPILERS! Gee I feel old!
Sorry. Zero rules. - Anonymous
March 18, 2004
No one else is claiming programming on the 12 bit word PDP-8. Now there was a system for real progammers. :-) None of those frills like multiply, divide and subtract.
I have to say though that I miss the ability to set different lower bounds for arrays in VB. It let you write things that were intuitive like having arrays that went from 1900 to 2000 for a set of years. Sure you can get around it but it adds complexity that many would just as soon let the compiler worry about. - Anonymous
March 19, 2004
Fortran.NET has built in non-zero based arrays... - Anonymous
March 19, 2004
Back in the <em>old</em> old days, everything was 1-based, because zero was not yet considered a number. So you had <small>AD</small> 1 immediately following 1 <small>BC</small>, and most years in the second century began with a 1 rather than a 2. To the Romans, December was the third month after October, and a newborn was in his first year; so we can consider ourselves more sophisticated now that we say, more sensibly, that December is two months later than October and a newborn is of age zero. Unlike centuries, we now start decades (like "the 60's") on years that end with 0 instead of 1, though we still have no good name for our current decade. The day starts not at 1 <small>AM</small> but at 12! It even sounds downright strange to say that we can't vote until our nineteenth year. So now that mankind has finally gotten used to the idea that zero is a number in its own right, numbering things from zero is seeming ever more sensible. - Anonymous
March 19, 2004
Oops, no HTML in comments, huh? - Anonymous
April 16, 2004
Why can't you say - Anonymous
April 20, 2004
The comment has been removed - Anonymous
December 18, 2004
Helpful For MBA Fans. - Anonymous
October 05, 2006
One of the changes from VB6 to VB.NET was the removal of non-zero lower bounded arrays... a concept discussed by Eric Gunnerson