Skip to content

Conversation

@AndyJGraham
Copy link
Contributor

Emit conditional negate instructions when compiling the following Java code fragments:

“(a < 0) ? -a : a” "(a <= 0) ? -a : a"
"(a > 0) ? a : -a" "(a >= 0) ? a : -a".

…wing Java code fragments:

“(a < 0) ? -a : a” "(a <= 0) ? -a : a"
"(a > 0) ?  a : -a" "(a >= 0) ? a : -a".

Change-Id: I7930bdef17e4ec3c1c3ebc5d4272de8d0973a0b5
@teshull
Copy link
Member

teshull commented Mar 21, 2022

I cleaned up this PR here: #4409

Some comments are inline

@teshull
Copy link
Member

teshull commented Mar 21, 2022

@AndyJGraham do you have examples of perf improvements from this optimization?

// this Boolean variable records whether the platform is AArch64 or not.
if (Services.getSavedProperties().get("os.arch").equals("aarch64")) {
isAArch64 = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now AMD64 & LLVM also support the AbsNode, so this can be applied to all architectures

The general way though to add architecture-specific traits is through LoweringProvider

* Detect (x < 0) ? -x : x (and similar variants) and replace with an
* "abs" node. The "absolute" node only applies to AArch64 machines.
*/
if (lt.getY().isJavaConstant() && (lt.getY().asJavaConstant().asLong() == 0 || lt.getY().asJavaConstant().asLong() == 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above check for lt.getY().isDefaultConstant is equivalent to lt.getY().isJavaConstant() && lt.getY().asJavaConstant().asLong() == 0 so that check isn't needed here.

Too I refactored the code so that lt.getY().isJavaConstant() && lt.getY().asJavaConstant().asLong() == 1 can be triggered

@AndyJGraham
Copy link
Contributor Author

@teshull Please find attached a document describing the performance testing for this change. Originally, the performance testing showed a 34% performance improvement. However, this took place on a Cortex A72 machine, with an architecture that is now looking rather old. For this reason, the testing was repeated on three machines with differing architectures, all running Linux.

Within the attached document, one can see that the results are very different.

One can clearly see that the code change gives an improvement of between 11 and 30 percent on the Raspberry Pi 4. This is not surprising, as the architecture is like that of the machine on which the original measurements were taken.

However, on the Altra machine, a more recent architecture, one sees a performance degradation of between 13 and 30 percent.

Finally, on the Odroid machine, the change makes no discernible performance difference. (It is worth noting that this is a relatively new, “in order” core.)

With such varying results, especially poor on the most recent platform, it is difficult to justify this change and the feeling within Arm is that this patch should be abandoned.

Perf Comparison.pdf

@teshull
Copy link
Member

teshull commented Apr 8, 2022

@AndyJGraham Thanks for this description. I'll read it next week and will think about it as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants