NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 10.4k

Code
Issues 560
Pull requests 215
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: NVIDIA/TensorRT-LLM

Labels 42 Milestones 0

New pull request New

215 Open 1,224 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Draft: test: [CI] remove closed bugs

#4046 opened May 4, 2025 by xinhe-nv • Draft

fix: apply rope twice in Qwen3.

#4040 opened May 3, 2025 by yuxianq

Loading…

chore: cleanup llmapi for 1.0

#4039 opened May 2, 2025 by hchings

Loading…

[Deepseek] Add fp8 kvcache test

#4038 opened May 2, 2025 by hlu1

Loading…

Add disaggregated serving accuracy tests

#4036 opened May 2, 2025 by Tabrizian

Loading…

test: Test OOB access issue in penaltyKernel for endId=-1

#4035 opened May 2, 2025 by brb-nv

Loading…

Draft: feat: non-invasive pipeline parallelism

#4034 opened May 2, 2025 by yuxianq

Loading…

[nvbug/5248986][fix] Skip debugCheckSemaphores in stream capture mode

#4032 opened May 2, 2025 by mikeiovine

Loading…

[DRAFT] setting attention prior

#4031 opened May 2, 2025 by vklimkov-nvidia

Loading…

[DRAFT] Introducing multi-vocab token sampling for audio generation

#4030 opened May 2, 2025 by vklimkov-nvidia

Loading…

fix: instantiate decoder early in pytorch

#4029 opened May 2, 2025 by dcampora

Loading…

feat:enable kvcache to be reused during request generation Community Engagement Community want to contribute

#4028 opened May 2, 2025 by narutolhy

Loading…

Refactor: Restructure C++ tests for better modularisation of non-shared code

#4027 opened May 2, 2025 by DomBrown

Loading…

fix: Properly get decoding mode according to same logic as cpp.

#4026 opened May 2, 2025 by dcampora

Loading…

[AutoDeploy][perf] Further optimize flashinfer backend in AutoDeploy

#4024 opened May 2, 2025 by suyoggupta

Loading…

feat: Enable AutoDeploy to llm-eval example

#4020 opened May 2, 2025 by meenchen

Loading…

feat:add slurm support and add b40 to test-db

#4019 opened May 2, 2025 by yuanjingx87

Loading…

TorchLLM: Pass local dir to processor creation

#4018 opened May 1, 2025 by milesial

Loading…

Draft: Support long context LLama 4 (flashinfer backend)

#4017 opened May 1, 2025 by vanshilshah97 • Draft

[Deepseek] Refactor Deepseek Decoder layer

#4016 opened May 1, 2025 by hlu1

Loading…

[Draft][AutoDeploy] Split prefill and decode in AD's flashinfer backend

#4015 opened May 1, 2025 by suyoggupta • Draft

[fix] support llama + eagle head checkpoint conversion

#4013 opened May 1, 2025 by jhaotingc

Loading…

bench: Port benchmark_serving.py

#4011 opened May 1, 2025 by kaiyux • Draft

Put the computation of Q and K norm (in attn) into a single CUDA stream, and get a 5% - 8% throughput improvement on Qwen3 4B and Qwen3 - moe 30B - A3B.

#4005 opened May 1, 2025 by shaonvidia

Loading…

experiments: set self.max_position_embedding=None in Attention module

#4002 opened Apr 30, 2025 by qixiang-99 • Draft

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly