64-bit Windows Part 4: The x64 Standard
The first x64 processor was the AMD Opteron. The Opteron x64 processor has at least three very important features.
First, it is a 64-bit processor. Remember that we said that instructions to a 64-bit processor can refer to memory addresses using 64-bit binary numbers? Well, those addresses are entered into memory locations on the processor that are called registers. The Opteron processor has all of the same registers that an x86 processor has, but on the Opteron processor, those registers are 64-bits in size, rather than 32-bits in size, so that the processor can function as a 64-bit processor.
The second important feature of the Opteron has to do with how the processor accesses memory and other resources.
The typical x86 processor prior to the Opteron accessed memory via a memory controller located on a separate chip called the North Bridge, which also served as the connection to the Level 2 cache, the AGP slot, and some of the PCI devices. So, a lot of traffic was traveling over the connection between the processor and the North Bridge, a connection called the Front Side Bus. That connection became a bottleneck, slowing down access to memory. In multi-processor systems, the processors shared the Front Side Bus, so the bottleneck became even more severe.
An Opteron processor has a memory controller built right in, so data in memory does not have to travel to the processor via the North Bridge across the Front Side Bus. Furthermore, the processors do not compete for access across the Front Side Bus, but, where they need to access memory via one another’s memory controllers, they do so across a dedicated connection channel called the HyperTransport that is controlled by logic built into the processors rather than into external chips. In theory, not only is performance improved, but scalability is improved as well, while costs decrease because fewer chips are required.
The third important feature of the Opteron processor is its compatibility with x86 software. To begin with, the instruction set for the Opteron is a superset of that the x86 instruction set. In fact, only two instructions were added! Second, the processor has a switch to be flipped by the operating system that determines whether it is executing instructions in 32-bit or 64-bit mode. If the operating system is itself 32-bit, then the processor goes into 32-bit mode for the duration of its execution of that O/S. If the operating system is 64-bit, but a 32-bit x86 application is executed, then the processor goes into 32-bit mode as it executes that application’s instructions. By virtue of this architecture, when the Opteron executes 32-bit x86 applications, it does them as fast as one would expect it to given the speed of the processor.
Opteron’s have proven to be tremendously efficient processors. AnandTech published the results of some benchmark tests pitting Opterons against 32-bit Xeon processors at the end of 2003. Serving a Web application, the Opteron outpaced the Xeon by a whopping 45%. Serving a database, the Xeon and the Opteron were comparable in performance in a single-processor scenario, but the Opteron was 8.5% faster in a four-way multi-processor configuration, proving the superior scalability of AMD’s design eliminating the Front-Side Bus as a scarce resource that the processors must contend for to access memory and to do I/O.
At Microsoft, I’m told that we’ve switched to using x64 machines to compile our Windows operating systems, and the effects have been stunning. Windows Server 2003 Service Pack 1 took 9 hours to compile on x86 machines, but just 3 hours on x64s. Longhorn took 18 hours to build on x86 machines, but just 6 hours on x64s!