Skip to content

Commit 8811f7a

Browse files
authored
Added blog post "INT4 Decoding GQA CUDA Optimizations for LLM Inference" (#1648)
Signed-off-by: Chris Abraham <[email protected]>
1 parent c2cd932 commit 8811f7a

23 files changed

+3485
-0
lines changed

_posts/2024-06-06-int4-decoding.md

+3,485
Large diffs are not rendered by default.

assets/images/int4-decoding/eq.jpg

53.4 KB
Loading

assets/images/int4-decoding/fg1.png

98.9 KB
Loading

assets/images/int4-decoding/fg10.jpg

18.2 KB
Loading

assets/images/int4-decoding/fg11.jpg

54.6 KB
Loading

assets/images/int4-decoding/fg12.png

411 KB
Loading

assets/images/int4-decoding/fg13.jpg

296 KB
Loading

assets/images/int4-decoding/fg14.jpg

207 KB
Loading

assets/images/int4-decoding/fg15.jpg

347 KB
Loading

assets/images/int4-decoding/fg16.jpg

460 KB
Loading

assets/images/int4-decoding/fg17.jpg

348 KB
Loading

assets/images/int4-decoding/fg18.png

262 KB
Loading

assets/images/int4-decoding/fg19.jpg

660 KB
Loading

assets/images/int4-decoding/fg2.png

343 KB
Loading

assets/images/int4-decoding/fg20.jpg

658 KB
Loading

assets/images/int4-decoding/fg21.png

103 KB
Loading

assets/images/int4-decoding/fg3.png

173 KB
Loading

assets/images/int4-decoding/fg4.jpg

181 KB
Loading

assets/images/int4-decoding/fg5.png

333 KB
Loading

assets/images/int4-decoding/fg6.png

505 KB
Loading

assets/images/int4-decoding/fg7.png

290 KB
Loading

assets/images/int4-decoding/fg8.png

689 KB
Loading

assets/images/int4-decoding/fg9.png

215 KB
Loading

0 commit comments

Comments
 (0)