Replies: 2 comments 2 replies
-
What is the warp size and shared memory size for this GPU? These should be printed out on startup. The first value is the workgroup size. I'm surprised this broke things unless the workgroup size is smaller than the warp size. Which is currently faster, m_warptile or s_warptile? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response. Here's the warp size and shared memory size of the GPU:
I pretty much brute-forced all possible combinations while tuning In my case, |
Beta Was this translation helpful? Give feedback.
-
I'm working on Local Diffusion, using stable-diffusion.cpp on Android. Vulkan performance on Mali GPUs is currently very poor
Disabling
mul_mat_l
inggml-vulkan.cpp
helped a bit. I then tried modifying them_warptile
ands_warptile
values. Reducing the first element (m tile?) from 128 to 64 gave a ~3x inference speedup, but the output images were garbage/noisy.Questions:
m_warptile
ands_warptile
for Mali GPUs to get both performance and correct output?Looking for guidance to improve Vulkan matmul performance on Mali without breaking correctness
Beta Was this translation helpful? Give feedback.
All reactions