> our optimisation relies on having r < count, but if count is zero this condition cannot be met. [Since taking the remainder of a division by zero is undefined behaviour, the compiler would be well within its rights to assume that count != 0 and proceed with the transformation.]
Interestingly, even if you define division by zero to produce the remainder equal to the dividend, this optimization (replacing "(r + 1) % count" with "r + 1 == count ? 0 : r + 1") would still be legal.
zer00eyz•Mar 26, 2026
> r + 1 == count ? 0 : r + 1
It'is legal!
But looks like a few more steps when compiled.
Findecanor•Mar 26, 2026
BTW. On ARM and RISC-V, integer remainder from division by zero does indeed result in the dividend (and no trap), unless the compiler is doing something unusual.
taliesinb•Mar 26, 2026
Interesting. It all seems very brittle, though. And that something has gone very wrong with our ecosystem of tools, languages, and processes when it becomes advisable to massage source until specific passes in a specific version LLVM don't mess things up for other passes.
Not picking on the OA in the slightest; just thinking in terms of holistic system design. If you know what you want to happen, and you are smart enough to introspect the behavior of the tool and decide that it didnt happen, you are more than smart enough to just write it correctly in the first place.
Perhaps that is unrealistic, perhaps there is a hidden iceberg of necessary but convolutive optimizations no human could realistically or legibly write. But ok, where do you really need to engage in this kind of optimization golf? Inlined functions?
Ok, what about this targeted language feature for a future-day Zig:
1. Write an ordinary zig function
2. Write inline assembly version of that function
3. Write a "comptime assert" that first compiles to second, which only "runs" for the relevant arch.
4. What should that assert mean? That the compiler just uses your assembly version instead, but _also_ uses existing compiler machinery or an external theorem prover to verify they "behave the same up to X", for customizable values of X
That has the right feel, maybe. You are "pinning" specific, vetted, optimizations without compromising the intent, readability, or correctness of your code. And easy iteration is possible, because a failing comptime assert will just dump the assembly; you can even start with an empty manual impl.
adrianN•Mar 26, 2026
Having a way to assert that the compiler does what you expect can be helpful in large projects with many contributors of different skill level. Having something fail when a random change breaks autovectorization can save a lot of time profiling. When a compiler upgrade changes codegen I would also prefer an assertion telling me about it so that I can run relevant benchmarks to see whether it’s an improvement or not. Relying on whole-system benchmarks is difficult due to noise.
It feels like a leaky abstraction, and similarly implies there should be some way to drop down a layer and work directly with what’s beneath. Something in between C and non-portable assembly.
Asserting retroactively that compilers produce the correct assemvbly feels like just plain giving up on everything in between. Surely the best we can do isn’t a bunch of flaky weirdly interacting optimizations, UB footguns everywhere, things changing when updating the toolchain, etc?
Archit3ch•Mar 26, 2026
> Something in between C and non-portable assembly.
Writing your own LLVM passes?
mattnewport•Mar 26, 2026
I think a better approach might be automated performance regression tests. That's checking the property you probably actually care about directly (performance) and leaves the compiler (and other engineers) some leeway to do better without breaking the test.
Actually setting up a robust system for perf regression tests is tricky though...
villgax•Mar 26, 2026
Do not check OP's art on a work machine lol
ralferoo•Mar 26, 2026
> Here, the simplification analysis uses the conditional hidden in the assert() macro to figure out that we only execute the urem instruction if cur < count (see the isImpliedByDomCondition() method). This can have the unexpected consequence of making builds with assertions enabled faster in some cases; we can restore performance by replacing disabled assertions with assume attributes, which have no runtime impact.
It surprises me that the compiler doesn't still take the inference from the assert and just disable emitting the code to perform the check. Over the last 15 years I've worked on many codebases that are written with unnecessary asserts, partly as documentation, but maybe because people assumed it helped the compiler.
I've also worked on many codebases (and written code like this on my own projects) where the code looks like: assert(condition); if (condition) { ... } because I still want that safety check in release builds, and want the exception in debug builds but absolutely not ever in release builds.
jcalvinowens•Mar 26, 2026
> It surprises me that the compiler doesn't still take the inference from the assert and just disable emitting the code to perform the check.
The compiler isn't as clever as I think you're envisioning: assert() only works that way because it exits the control flow if the statement isn't true.
ralferoo•Mar 26, 2026
My point is that even though the assert() is optimised out, the compiler could still assume that the condition is valid.
jcalvinowens•Mar 26, 2026
How? The assert() has no special significance to the compiler at all, it's just code. Usually it's just an empty macro for non-debug builds.
I guess you could define your assert() use [[assume]] in C++ for non-debug builds... but that seems like a very bad idea to me.
Just leave the asserts in prod. They almost certainly don't have measurable overhead. The few that do can be dealt with separately.
The Linux kernel has thousands of asserts which are always checked at runtime (BUG_ON).
dzaima•Mar 26, 2026
> It surprises me that the compiler doesn't still take the inference from the assert and just disable emitting the code to perform the check.
That's because that's what the <assert.h> assert() must do; it's specified to do and imply nothing when assertions are disabled. (the standard literally fully defines it as `#define assert(...) ((void)0)` when NDEBUG)
Whereas `[[assume(...)]]` is a thing specifically for that "infer things from this without actually emitting any code".
ralferoo•Mar 26, 2026
Yeah, good point. Honestly it's been so long since I've added that to a project (it's normally hidden in some include that everything else includes) that I'd forgotten it wasn't a compiler level reserved keyword for C++ code.
derodero24•Mar 26, 2026
The assert-based optimization surprised me the first time I saw it in the wild. I had a hot path where I added an assert to document a precondition, and the generated code got measurably faster — the optimizer eliminated a downstream branch family it couldn't have known was dead otherwise.
The division UB case is thornier. In Rust you get explicit checked_div/wrapping_div, which makes the programmer's intent visible at the call site. The C approach of "trust that this is never zero" is the same mechanism but with none of the documentation.
5 Comments
Interestingly, even if you define division by zero to produce the remainder equal to the dividend, this optimization (replacing "(r + 1) % count" with "r + 1 == count ? 0 : r + 1") would still be legal.
It'is legal!
But looks like a few more steps when compiled.
Not picking on the OA in the slightest; just thinking in terms of holistic system design. If you know what you want to happen, and you are smart enough to introspect the behavior of the tool and decide that it didnt happen, you are more than smart enough to just write it correctly in the first place.
Perhaps that is unrealistic, perhaps there is a hidden iceberg of necessary but convolutive optimizations no human could realistically or legibly write. But ok, where do you really need to engage in this kind of optimization golf? Inlined functions?
Ok, what about this targeted language feature for a future-day Zig:
1. Write an ordinary zig function 2. Write inline assembly version of that function 3. Write a "comptime assert" that first compiles to second, which only "runs" for the relevant arch. 4. What should that assert mean? That the compiler just uses your assembly version instead, but _also_ uses existing compiler machinery or an external theorem prover to verify they "behave the same up to X", for customizable values of X
That has the right feel, maybe. You are "pinning" specific, vetted, optimizations without compromising the intent, readability, or correctness of your code. And easy iteration is possible, because a failing comptime assert will just dump the assembly; you can even start with an empty manual impl.
Asserting retroactively that compilers produce the correct assemvbly feels like just plain giving up on everything in between. Surely the best we can do isn’t a bunch of flaky weirdly interacting optimizations, UB footguns everywhere, things changing when updating the toolchain, etc?
Writing your own LLVM passes?
Actually setting up a robust system for perf regression tests is tricky though...
It surprises me that the compiler doesn't still take the inference from the assert and just disable emitting the code to perform the check. Over the last 15 years I've worked on many codebases that are written with unnecessary asserts, partly as documentation, but maybe because people assumed it helped the compiler.
I've also worked on many codebases (and written code like this on my own projects) where the code looks like: assert(condition); if (condition) { ... } because I still want that safety check in release builds, and want the exception in debug builds but absolutely not ever in release builds.
The compiler isn't as clever as I think you're envisioning: assert() only works that way because it exits the control flow if the statement isn't true.
I guess you could define your assert() use [[assume]] in C++ for non-debug builds... but that seems like a very bad idea to me.
Just leave the asserts in prod. They almost certainly don't have measurable overhead. The few that do can be dealt with separately.
The Linux kernel has thousands of asserts which are always checked at runtime (BUG_ON).
That's because that's what the <assert.h> assert() must do; it's specified to do and imply nothing when assertions are disabled. (the standard literally fully defines it as `#define assert(...) ((void)0)` when NDEBUG)
Whereas `[[assume(...)]]` is a thing specifically for that "infer things from this without actually emitting any code".
The division UB case is thornier. In Rust you get explicit checked_div/wrapping_div, which makes the programmer's intent visible at the call site. The C approach of "trust that this is never zero" is the same mechanism but with none of the documentation.