-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
test: Test OOB access issue in penaltyKernel for endId=-1
#4035
opened May 2, 2025 by
brb-nv
Loading…
[nvbug/5248986][fix] Skip debugCheckSemaphores in stream capture mode
#4032
opened May 2, 2025 by
mikeiovine
Loading…
[DRAFT] Introducing multi-vocab token sampling for audio generation
#4030
opened May 2, 2025 by
vklimkov-nvidia
Loading…
feat:enable kvcache to be reused during request generation
Community Engagement
Community want to contribute
#4028
opened May 2, 2025 by
narutolhy
Loading…
Refactor: Restructure C++ tests for better modularisation of non-shared code
#4027
opened May 2, 2025 by
DomBrown
Loading…
fix: Properly get decoding mode according to same logic as cpp.
#4026
opened May 2, 2025 by
dcampora
Loading…
[AutoDeploy][perf] Further optimize flashinfer backend in AutoDeploy
#4024
opened May 2, 2025 by
suyoggupta
Loading…
Draft: Support long context LLama 4 (flashinfer backend)
#4017
opened May 1, 2025 by
vanshilshah97
•
Draft
[Draft][AutoDeploy] Split prefill and decode in AD's flashinfer backend
#4015
opened May 1, 2025 by
suyoggupta
•
Draft
[fix] support llama + eagle head checkpoint conversion
#4013
opened May 1, 2025 by
jhaotingc
Loading…
experiments: set
self.max_position_embedding=None
in Attention module
#4002
opened Apr 30, 2025 by
qixiang-99
•
Draft
Previous Next
ProTip!
no:milestone will show everything without a milestone.