Skip to content

Latest commit

 

History

History
156 lines (120 loc) · 4.36 KB

train_benchmark.md

File metadata and controls

156 lines (120 loc) · 4.36 KB

Train Benchmark

Fig

Figure_1

Model Stage Paddle training speed(ips) Contrast Pytorch training speed(ips) Paddle GPU memory uage(G)
LLaVA1.6 7B Pretrain 82 +26% 65 19/22
SFT 52 +6% 49 33/49
LoRA 56 +14% 49 16/17
LLaVA1.6 13B Pretrain 52 +18% 44 33/36
SFT 24 +4% 23 50/68
LoRA 36 +5% 34 29/30
Qwen2VL 2B SFT 33 +43% 23 -
Qwen2VL 7B SFT 13 +18% 11 -
Stable Diffusion 1.5 Pretrain 560 -12% 638 28/34
LoRA 200 +6% 187 30/34
Stable Diffusion 3 SFT (Dreambooth) 34 0 34   -
LoRA 66 -0.01% 67 -

Notes:

  • All models were tested on the H800 (8 * 80G) platform
  • For GPU menory usage, the table shows max_memory_allocated/max_memory_reserved
  • Please see below for the testing configuration details.
See
Software Version
CUDA 12.3
CUDNN 9.0
PaddlePaddle 3.0beta2
PaddleNLP 3.0beta3
Pytorch 2.5