Skip to content

Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_120' #13271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jacekpoplawski opened this issue May 2, 2025 · 6 comments
Open

Comments

@jacekpoplawski
Copy link

jacekpoplawski commented May 2, 2025

Git commit

commit 3f3769b (HEAD -> master, origin/master, origin/HEAD)

Operating systems

Windows

GGML backends

CUDA

Problem description & steps to reproduce

After replacing 3090 with 5070 I see compilation error:
nvcc fatal : Unsupported gpu architecture 'compute_120'
I found this: ggml-org/whisper.cpp#3030
Adding -DCMAKE_CUDA_ARCHITECTURES="86" solved my llama.cpp compilation problem.
Should this fix be merged also into llama.cpp?

First Bad Commit

No response

Compile command

cmake -DGGML_CUDA=ON -DLLAMA_CURL=OFF .. 
cmake --build . --config Release -j 30

Relevant log output

nvcc fatal : Unsupported gpu architecture 'compute_120'
@JohannesGaessler
Copy link
Collaborator

Does deleting and re-creating the CMake build directory fix the issue?

@JohannesGaessler
Copy link
Collaborator

JohannesGaessler commented May 2, 2025

Just to make it absolutely clear: when you replace the GPU you also have to recompile the project because the generated GPU code is specific to the GPU compute capability. Also, make sure that your CUDA version is recent enough to support RTX 5000.

@jacekpoplawski
Copy link
Author

I always create new build dir, that's the point.
I just tried again with the commands above and I see:
nvcc fatal : Unsupported gpu architecture 'compute_120'

@JohannesGaessler
Copy link
Collaborator

Ah sorry, I think I didn't read your error message correctly. The llama.cpp default is to compile for the "native" architecture, meaning to compile for the GPUs connected for the system. With a connected RTX 5000 GPU it tries to build for compute capability 12.0 which is unsupported (I suspect your CUDA version is too old). If you specify compute capability 8.6 the code is built for RTX 3000; because the code is forwards-compatible it also runs on RTX 5000. But this should not be made the llama.cpp default because it would not work for any older GPUs.

@Panchovix
Copy link

Panchovix commented May 3, 2025

What CUDA version are you using? nvcc comes from CUDA itself, and you need 12.8 at least for blackwell 2.0.

@jacekpoplawski
Copy link
Author

You were right, correctly updating to cuda_12.9.0_576.02_windows.exe resolved the issue.
It was likely caused by nvcc still pointing to version 12.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants