Changes in floating point calculation since Windows 24H2

Question

Hello,
Since last windows 11 update (24H2), we observe slight differences between our continuous integration (WinServer 2022 21H2) and our develpers computers regarding some of our double precision calculation.

We checked iteratively versions of windows to clearly identify this was introduced when updating to 24H2, using the same binary yield different results thus failing our unit tests since we expect a deterministic result and binary exact equality (e.g dumping a value / re-reading it shall be binary equal).

But what's is interesting is that it do not happen with all our tests, so probably due to special float handling.

Note that :

we compile using /fp:strict compiler option
the results changes at runtime
we checked using the same hw but different windows version, results are not the same
the results are deterministics (the same between run using same environment)
control_fp is the same between both environment rounding + exceptions
we don't mixup float and double and uses double everywhere

I didn't find anything regarding this in release notes or forums except this Post (Changes to SEH on Windows 11 24H2 causing problems) more or less related.

Also I don't yet have a minimal reproducible code to share as I didn't find the exact culprit leading to this change still investigating, will update if I find something new.

Is this something that might happen between windows versions, or a bug fixed in 24H2 leading to this breaking change ?

Do you have any clue how to investigate this further or even a potential fix ?

Thanks for your time.

Have a nice day.

Answer

Hi,

Thanks for your answer and support

Sorry if my statement was not clear enough, what I meant is that compilation wasn't involved at all and everything happening at runtime (but after reading here and there most of fp related operations are done at runtime so makes sense).

Running the binary twice on the exact same computer (at t time ) yields the same results. Running the binary on the exact same computer before and after upgrade to 24H2 yields different results.

It's clear something changed between both windows version (might be driver, or as you said some optimizations).

Following your suggestion regarding 1ULP comparison, I checked binary representation of the wrong result we're having, and its seems way more than 1ULP.

windows =24H2 result : 0011111010100001001010101010100000111101100101111000111010011111

windows <24H2 result : 0011111010011011110001100110111100000110011010111101101011110000

I'm not yet familiar enough with 1ULP concept but am I correct if I assume it shouldn't impact a sequence of float operations (add, substr, mult, etc...) and that the result of the sequence should still be within 1ULP ?

Note that tests that are successful (see below) matches exactly

To give a bit of a context in which case we observe this behaviour :

We have 1 test suite with 4 different cases, only one of them is failing.

In the 4 different cases, we have the following matrix :

Mesh file	Algorithm	test result
File1	Algo1	failure
File1	Algo1	failure
File1	Algo2	success
File2	Algo1	success
File2	Algo2	success

So from my understanding something happen in between File1 - Algo1 combination not yet identified.

During those tests, we want to perform a registration of a plane to another plane.

We open a mesh, read the triangles get some of the triangles and transform them into a pointcloud to serves as input of the algorithm.

We build a distance map from the triangle and use it in the cost function.

From there we try iteratively perform the registration of the points to the appropriate plane based on distance.

So there are some operations performed but nothing too fancy, we're just using Eigen::Matrix4d (inverse, mult mostly), and TriangleMesh class from Open3D read from an STL file. - everything is done on CPU side.

I don't know the natures of the algorithms / optimisations that could have been included into windows 24H2 or in a new intel microcode shipped with it and I think it's a good lead actually.

Do you know where I could find additional informations regarding this specific topic ? A detailed release note of windows 24H2, some driver / microcode version inspection or something like that.

Otherwise any way to rollback from 24H2 to 23H2 maybe ? Our target product is stuck on 22H2 for now and we need to ensure reproductability as much as possible.

I'll continue investigating on tests step by step reducing used samples or something like that if there are some "bad" triangles somewhere.

Thanks you for your time, have a nice day

Additional infos if it helps

CPU-Z where this bug hapens after upgrading.
User's image

Windows (Specs dump):

Windows 11 Pro Edition Version 24H2
OS Build 26100.2605
Windows Features Packs 1000.26100.36.0

Share via

Changes in floating point calculation since Windows 24H2

1 answer

Additional infos if it helps

Your answer