Model | Stage | Paddle training speed(ips) | Contrast | Pytorch training speed(ips) | Paddle GPU memory uage(G) |
---|---|---|---|---|---|
LLaVA1.6 7B | Pretrain | 82 | +26% | 65 | 19/22 |
SFT | 52 | +6% | 49 | 33/49 | |
LoRA | 56 | +14% | 49 | 16/17 | |
LLaVA1.6 13B | Pretrain | 52 | +18% | 44 | 33/36 |
SFT | 24 | +4% | 23 | 50/68 | |
LoRA | 36 | +5% | 34 | 29/30 | |
Qwen2VL 2B | SFT | 33 | +43% | 23 | - |
Qwen2VL 7B | SFT | 13 | +18% | 11 | - |
Stable Diffusion 1.5 | Pretrain | 560 | -12% | 638 | 28/34 |
LoRA | 200 | +6% | 187 | 30/34 | |
Stable Diffusion 3 | SFT (Dreambooth) | 34 | 0 | 34 | - |
LoRA | 66 | -0.01% | 67 | - |
Notes:
- All models were tested on the H800 (8 * 80G) platform
- For
GPU menory usage
, the table showsmax_memory_allocated/max_memory_reserved
- Please see below for the testing configuration details.
See
Software | Version |
---|---|
CUDA | 12.3 |
CUDNN | 9.0 |
PaddlePaddle | 3.0beta2 |
PaddleNLP | 3.0beta3 |
Pytorch | 2.5 |