-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Issues: NVIDIA/TensorRT-LLM
[RFC]Feedback collection about TensorRT-LLM 1.0 Release Plann...
#3148
opened Mar 29, 2025 by
juney-nvidia
Open
1
[RFC]Topics you want to discuss with TensorRT-LLM team in the...
#3124
opened Mar 27, 2025 by
juney-nvidia
Open
9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Call for contributions]The development plan of large-scale EP support in TensorRT-LLM
Community Engagement
#4127
opened May 7, 2025 by
juney-nvidia
Qwen2-0.5B Inference Freezes with TensorRT-LLM on RTX 5000
#4118
opened May 7, 2025 by
ashkanzarkhah
Install from docs not working
bug
Something isn't working
Installation
triaged
Issue has been triaged by maintainers
#4099
opened May 6, 2025 by
darraghdog
4 tasks
apply_per_channel_scale
may fail to launch when the input sequence length is extremely large
bug
#4085
opened May 6, 2025 by
StudyingShao
1 of 4 tasks
deepseek-fp4 quality issue -- bad generation output after the second batch run
#4072
opened May 5, 2025 by
xwuShirley
Use report.json from trtllm-bench in perf integration tests instead of redirecting stdout
others
#4071
opened May 5, 2025 by
venkywonka
LLama 3.1 405B FP8 perf low on B200
bug
Something isn't working
#4043
opened May 3, 2025 by
andyluo7
4 tasks
Deepseek R1 and V3, FP4 quant, output quality issues at batch size > 2
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#4037
opened May 2, 2025 by
pankajroark
2 of 4 tasks
Add FP8 Support for Llama-4-Maverick on B200 (Feature Request)
#4033
opened May 2, 2025 by
indianspeedster
Is text-only inference supported for multi-modal vision models like Llama 3.2 11b/90b Vision Instruct?
#4012
opened May 1, 2025 by
natehofmann
[bug] Lookahead spec-dec verifies w guesses instead of g
bug
Something isn't working
#4003
opened Apr 30, 2025 by
mahmoudhas
3 of 4 tasks
Qserve-w4a8 Shows Lower Computational Efficiency on H20
not a bug
Some known limitation, but not a bug.
#3977
opened Apr 30, 2025 by
StaryDing
2 of 4 tasks
Reasoning-related enhancements
feature request
New feature or request
roadmap
#3965
opened Apr 29, 2025 by
mk-nvidia
Disaggregated Prefill & Decode serving optimizations
Investigating
Performance
Issue about performance number
roadmap
triaged
Issue has been triaged by maintainers
#3963
opened Apr 29, 2025 by
mk-nvidia
MoE optimizations
Investigating
Performance
Issue about performance number
roadmap
triaged
Issue has been triaged by maintainers
#3962
opened Apr 29, 2025 by
mk-nvidia
Support versioned github.io doc to make it easy to map code with the corresponding doc version
Documentation
Improvements or additions to documentation
Investigating
roadmap
triaged
Issue has been triaged by maintainers
#3961
opened Apr 29, 2025 by
mk-nvidia
Re-organize the example directory into Feature level examples and Model level examples
Documentation
Improvements or additions to documentation
Investigating
roadmap
triaged
Issue has been triaged by maintainers
#3960
opened Apr 29, 2025 by
mk-nvidia
Intra-1.x-version backward compatibility for selected APIs.
Investigating
roadmap
triaged
Issue has been triaged by maintainers
#3958
opened Apr 29, 2025 by
mk-nvidia
Plenty of regressions in trt-llm v0.20.0
bug
Something isn't working
#3955
opened Apr 29, 2025 by
michaelfeil
4 tasks
Context node crash when using PD Disaggregation
bug
Something isn't working
#3937
opened Apr 29, 2025 by
nsealati
2 of 4 tasks
Previous Next
ProTip!
Updated in the last three days: updated:>2025-05-04.