-
Notifications
You must be signed in to change notification settings - Fork 776
Issues: kubeflow/trainer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Implement TrainingRuntimes finalizer mechanism
kind/feature
#2609
opened Apr 21, 2025 by
tenzen-y
2 tasks
Flaky Test: Should fail in creating trainJob with invalid trainer config for torch runtime
kind/bug
#2605
opened Apr 18, 2025 by
tenzen-y
Implement validations to prevent changing TrainingRuntime
area/webhook
good first issue
help wanted
kind/feature
#2599
opened Apr 16, 2025 by
tenzen-y
Support XGBoost/LightGBM runtime and examples
area/runtimes
kind/feature
#2598
opened Apr 14, 2025 by
nqvuong1998
KEP-2401: Revisit DependsOn API in CTRs When Supporting Multiple Ancestor
area/deployment
area/llm
kind/feature
#2592
opened Apr 10, 2025 by
Electronic-Waste
KEP-2401: Create LLM Training Runtimes for Llama 3.3 model family
area/llm
kind/feature
#2591
opened Apr 10, 2025 by
Electronic-Waste
Add Helm integration tests to GitHub actions workflow
area/testing
kind/feature
#2577
opened Mar 29, 2025 by
ChenYi015
Contributors Guide to Trainer v2 Docs
area/docs
good first issue
help wanted
#2574
opened Mar 29, 2025 by
SanthoshToorpu
Automated way to generate Kustomize manifests from Helm templates
area/deployment
kind/feature
#2572
opened Mar 28, 2025 by
ChenYi015
Unable to Access Monitoring Port (Prometheus Metrics) on Kubeflow Trainer Controller Manager
area/monitoring
kind/bug
#2547
opened Mar 19, 2025 by
izuku-sds
User guide for PyTorch Training
area/docs
good first issue
help wanted
#2543
opened Mar 18, 2025 by
andreyvelich
Operator guide to manage TrainingRuntime and ClusterTrainingRuntime
area/docs
good first issue
help wanted
#2542
opened Mar 18, 2025 by
andreyvelich
KEP-2170: Add manifest overlays for standalone installation
kind/feature
#2526
opened Mar 16, 2025 by
Doris-xm
Support TrainJob ResourcePerNode in CoScheduling plugin
area/controller
kind/feature
#2525
opened Mar 15, 2025 by
tenzen-y
KEP-2401: Determine the tag for torchtune trainer & Add support for multiple accelerators
area/llm
kind/feature
#2518
opened Mar 13, 2025 by
Electronic-Waste
Get and Use TrainingRuntime ApplyConfiguration throughout KF PipelineFramework
area/controller
kind/feature
#2515
opened Mar 13, 2025 by
tenzen-y
KEP-2401: Create LLM Training Runtimes for Llama 3.1 model family
area/llm
area/runtimes
kind/feature
#2509
opened Mar 12, 2025 by
Electronic-Waste
KEP-2401: Support LoRA/QLoRA/DoRA fine-tuning in LLM Trainer V2
area/llm
area/sdk
kind/feature
#2505
opened Mar 12, 2025 by
Electronic-Waste
Add a workflow for publishing Helm charts
area/deployment
good first issue
help wanted
kind/feature
#2488
opened Mar 7, 2025 by
ChenYi015
Decouple UTs between Framework and Plugins packages
area/controller
kind/feature
#2468
opened Mar 3, 2025 by
tenzen-y
2 of 6 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-03-29.