Jaa


I'm (in)famous

It's been a while - I'm now the proud father of 3 children, up from a mere 2, and I'm finally back at work, and resuming normal developer duties.  In the mean time, check out my glamorous appearance on "the .Net Show" on MSDN:

 

https://msdn.microsoft.com/theshow/episode.aspx?xml=theshow/en/Episode051/manifest.xml

 

I'm the 22 minutes listed under 'Enter the Programmer'.  I seem to have uncovered a hidden talent during this taping:  I can blather on about Win64 until I'm blue in the face!

 

-Kev

Comments

  • Anonymous
    July 13, 2005
    Congratulations re: the new baby - now that you are back may I ask a question?

    I have a query about how important it really is to maintain RSP on 16-byte alignment. It seems to be a "given" that this will aid performance and the X64 documentation is very insistent about this. However the AMD documentation says:-
    "Stack Alignment. Control-transfer performance can degrade significantly when the stack pointer is not aligned properly. Stack pointers should be word aligned in 16-bit segments, doubleword aligned in 32-bit segments, and quadword aligned in 64-bit mode."
    Section 3.73 Chapter 3 "General Purpose Programming" AMD64.

    The reason I ask is that I intend to convert my assembler (GoAsm) to work with 64-bit source code for applications running under Windows XP64. To make things easier for the user and to allow the same source code to be used both for Win32 and Win64 (AMD64), my assembler will itself have to do a fair amount of tweaking of the code to establish the correct stack frames, parameter passing and exception handling.

    My assembler can easily achieve this when it makes a stack frame for a "frame function". But in assembler source code there tend to be a lot of "leaf" functions. Often a leaf function will call another leaf function. The first call of this type would alter RSP only by 8-bytes, leaving it unaligned until the next call. Also assembler programmers like to save registers on the stack using the PUSH instruction, and maybe later take them off the stack and put them in another place. This would also change the stack by 8-bytes at a time.

    My assembler could add code which for each call would automatically align RSP at run-time, but this would add bloat. So I am tempted to let RSP be misaligned by 8-bytes within a leaf function. And I am tempted to permit one leaf function call another leaf function. And also to call a frame function (maybe in this case aligning the stack first). I understand that on an AMD64 this would not cause an exception.

    What I cannot understand from any of the material I have seen so far, is how badly performance would be affected by a stack misaligned by 8-bytes. So I wonder if the requirement for RSP to be on a 16-byte boundary in the X64 documentation is actually a hangover from some previous thinking. Is it something which may be reconsidered and eventually dropped? I will of course do my own speed trials, but at my early planning stage any insight into this would be very useful.

    Many thanks for any help you can give me. I posted this question on the AMD64 developers forum but have had no reply.

    Jeremy Gordon
    Author of the the "Go" tools
  • Anonymous
    July 15, 2005
    You must align RSP within all functions that do anything with RSP. If you're writing an assembler to target Win64 on x64, you ought to read my ABI posting - lots of good details in there, along with links to appropriate official documentation.
  • Anonymous
    July 16, 2005
    Yes I did read that posting, in detail, it is extremely helpful, thanks. I accept of course that RSP must always be aligned to 8 bytes, that makes sense and the processor manufacturers require it. What I do not understand, nor at present accept, is why RSP must always be aligned to 16 bytes. Is this really necessary? If so, why?
  • Anonymous
    October 30, 2005
    RSP must always be aligned to 16 bytes for 2 reasons: 1) XMM registers are saved when a functions uses them, and they're saved using MOVDQA, not MOVDQU, and 2) the unwind information only allows the description of a frame pointer as a 16 byte offset, and things will act very badly if this is not followed (though they will only go badly during EH unwind)