Skip to content

Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS #5222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

martin-frbg
Copy link
Collaborator

This is sufficient to enable the SME version of the "small matrix SGEMM" kernel on Apple M4
Also added is commented-out code for recognizing the M4 as ARMV9SME - this is not yet useful except for testing, as
none of the ARMV8SVE kernels that the V9SME target builds upon support streaming SVE.

@vaiskv
Copy link
Contributor

vaiskv commented Apr 13, 2025

Hi @martin-frbg

For a non-Apple CPU, the check should enter this part of get_coretype() (verified on QEMU). Here when the TARGET is set as ARMV8, gotoblas_ARMV9SME is NULL whereas when the TARGET is set to ARMV9SME, gotoblas_ARMV9SME is not NULL and hence the architecture initialization is successful.

Please note that for compilation I am using the following command:

make BINARY=64 CC=aarch64-linux-android35-clang ONLY_CBLAS=1 HOSTCC=gcc TARGET=ARMV8 DYNAMIC_ARCH=1

Also, though the test is on QEMU, the SME sgemmdirect kernel will eventually have to run on a Qualcomm device as well. So I think we need to add support_sme1() check for 0x51 implementer ID here similar to the one added by you for Apple M4

@martin-frbg
Copy link
Collaborator Author

The way this is supposed to work is that for Linux, it checks a variety of implementer and cpu IDs, and if none of them matches, it runs support_sme1() to see if it should return ARMV9SME.
I think your future Qualcomm device should fit right into this code even if the 0x51 implementer id is not specifically catered for. (Unless it is Windows on Arm, for which there is currently no conditional code (like the sysctl for MacOS) instead of the Linux-specific hwcap or proc-based calls.)
I wonder if qemu does not set the capability flag for SME, so that support_sme1 is returning false ?

@vaiskv
Copy link
Contributor

vaiskv commented Apr 13, 2025

On QEMU, support_sme1() returns true which I verified using debug prints.

I think the issue is somewhere in gotoblas->init returning null.


 if (gotoblas && gotoblas->init) {
    strncpy(coren, gotoblas_corename(), 20);
    sprintf(coremsg, "Core: %s\n", coren);
    openblas_warning(2, coremsg);
    gotoblas -> init();
  } else {
    openblas_warning(0, "OpenBLAS : Architecture Initialization failed. No initialization function found.\n");
    exit(1);
  }

Moreover, the check for (gotoblas && gotoblas->init) is true when the library is compiled with TARGET=ARMV9SME DYNAMIC_ARCH=1. It fails when TARGET=ARMV8 or ARMV8SVE , DYNMAIC_ARCH=1.

I believe the init function maps to init_parameter() taken from the generated file setparam-ARMV9SME.c. This object (setparam-ARMV9SME.o) is getting generated in both the cases (ARMV8 and ARMV9SME). Not sure if I am missing something here .. :(

@vaiskv
Copy link
Contributor

vaiskv commented Apr 23, 2025

Hi @martin-frbg

Were you able to check on this issue? I tried to fix but without any luck. Please let me know if you figure out a solution.

@martin-frbg
Copy link
Collaborator Author

Unfortunately I'm still at the stage of building a kernel with SME support in a Debian VM under qemu (which is a lot slower than anticipated even on a fast x86_64). Wanted to try Arm FVP instead but did not quite figure out how to make that work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants