共用方式為


RyuJIT CTP5: Getting closer to shipping, and with better SIMD support

Hi Folks!  Yes, we understand it’s been a while since we shipped the last RyuJIT CTP.  We have been working hard on improving our SIMD support and getting RyuJIT to ship quality for the next version of the .NET Framework.  So without further ado, here’s a quick description of what you can expect from RyuJIT CTP5.

 

Correctness

We have spent a lot of time finding and fixing those last few pesky corner-case functional issues in RyuJIT.  Fortunately, we have the luxury of having many internal partners with a significant managed codebase, making it easy to throw as much managed code as we can find at RyuJIT.  While some of the issues we have found are legitimate bugs, others are not so clear cut.  For example, we have found that JIT64 accommodates some illegal IL disallowed by the ECMA spec.  Since backward compatibility is a major concern for us, we evaluate these issues on a case-by-case basis to decide if we should quirk RyuJIT to accommodate the same illegal IL.

 

Real-World Throughput Wins

In case you have missed the original blog post announcing the first CTP, RyuJIT beats JIT64 handily in terms of throughput while staying very competitive in terms of code quality (CQ).  Recently the Bing team has tried using RyuJIT on top of 4.5.1 in some of their processing, and they see a 25% reduction in startup time in their scenario.  This is the most significant real-world throughput win we have witnessed on RyuJIT thus far.  :)

 

Code Quality

We didn’t publish any benchmark results with RyuJIT CTP4, so here are some graphs to show that we haven’t regressed CQ in RyuJIT CTP5.  However, since CQ hasn’t been the focus for this CTP, we also haven’t made any significant improvements either.

These graphs follow the same basic format as previous ones.  The higher the bar, the better RyuJIT CTP5 is at that benchmark.  The grey area is the standard deviation, so any benchmark falling in the grey area is just noise. 

 

 

 

 

 

What’s New in JIT Support for SIMD types?

RyuJIT CTP5 supports acceleration of the latest version of the Vector APIs available via NuGet here.  This version contains a number of changes that were requested by developers.

One of the most popular requests was to publicly expose the fields of the fixed-size vector types (e.g. Vector2.X).  Why wasn’t this done originally?  The short answer is that it was for performance, but really it was to make it easier for the JIT to handle all the references to these types as intrinsics, and to transform them into the appropriate target instructions.  It’s a tricky business, however, to determine where to allocate a local Vector instance for best efficiency:

  • If the instance will be primarily used in Vector intrinsics, putting it in an xmm/ymm register is the best option.
  • If the instance will primarily be referenced via its fields, then either putting it in memory, or separately allocating its fields to registers, is the best option.
  • If the instance is larger than 8 bytes (i.e. not a Vector2), and it is primarily passed as a method argument, then putting it in memory is the best option.

With CTP5 we have made the JIT a bit smarter about identifying these field accesses, analyzing the usage of the vector instance, and selecting among these options, but there is still room for improvement, so you may find that some SIMD code runs more slowly with this new release.

We’ve also improved register allocation for SIMD types, reducing a number of cases where we had unnecessary copies of vector registers.

Since we are talking about SIMD performance, it wouldn’t be fair to not include any SIMD benchmark results.  We are using the sample code here as our SIMD benchmarks.  (However, note that we are using an updated version of RayTracer which uses our latest Vector APIs.  We’ll update the sample shortly.)

 

 

 

 

 

Stay tuned – we are continuing to work on performance for SIMD types, including tuning of inlining heuristics for SIMD methods, and improved dead store elimination.  We’ll also be diving into the usage data from Bing and other internal partners to see how we can improve the performance of RyuJIT even more on both throughput and CQ.

In case you need them again, you can refer to this blog post for the instructions to turn on RyuJIT, and this blog post for instructions on using SIMD.  Note that if you are running on the 4.5.2 version of the .NET Framework, you can use RyuJIT CTP5 on Windows Vista, 7, 8, and 8.1 as well as Windows Server 2008, 2008 R2, 2012, and 2012 R2.  However, RyuJIT CTP5 currently doesn't work on Visual Studio "14" CTP4.  You don't need it anyway, since RyuJIT is enabled by default on Visual Studio "14" CTP4.  :)  (The version of RyuJIT in Visual Studio "14" CTP4 is slightly older than this CTP, but not by much.)

Comments

  • Anonymous
    November 04, 2014
    If you install RyuJIT on a machine if it's not enabled, and is there a danger of problems when it's released removing and installing on that machine?

  • Anonymous
    November 04, 2014
    Cool! Seeing nice boost. Around 12% on 32-bit, and almost 30% on 64-bit.Also thanks for fixing previously reported bugs :)IronScheme now runs without issue using RyuJIT o/

  • Anonymous
    November 10, 2014
    Any idea when we will see a CTP with some code quality improvements?  With our internal benchmarks RyuJIT CTP5 still generates code the runs twice as slow.   This has been submitted to Microsoft, so it may already be fixed in some internal version but it would be nice to be able to see the improvements in person.  For us we would gladly sacrifice compiling time for better code quality as are runtime (simulations) are usually measured in hours and days so startup time is never an issue.

  • Anonymous
    November 26, 2014
    As of 11/12 nobody cares about VS 14 CTP4.  Can you tell us what the status of RyuJIT is in the VS 2015 Preview?  From what I can tell, it doesn't seem to be enabled by default.

  • Anonymous
    May 13, 2015
    .Net 4.6 RC x64 is twice as slow as x86 (release version) stackoverflow.com/.../net-4-6-rc-x64-is-twice-as-slow-as-x86-release-version

  • Anonymous
    May 25, 2015
    The comment has been removed

  • Anonymous
    May 25, 2015
    Quick update on my previous comment... I downloaded and built the System.Numerics.Vectors source from CoreFX on GitHub and that seems to work with .NET 4.6 RC.  I'm still not sure why the version that came with 4.6 RC is missing Vector<T> nor why the pre release on NuGet wouldn't work  for me anymore.  If anyone else runs into the same problem, maybe this will help.

  • Anonymous
    May 28, 2015
    jclary - The packaging changed between the CTP and RC, and the "fixed" vector types (e.g. Vector2) were moved into System.Numerics.dll, which is included with the default install of the framework.  Vector<T> is still in the System.Numerics.Vectors that is available on NuGet (but you need the latest).