vllm-serving

This example demonstrates how to deploy a vLLM model in kubernetes cluster.

Deploy the Model

You would need a valid kube config and a KUBECONFIG env var set.
The deployment can then be initiated from this directory as follows:

kubectl apply -f examples/qwen-vllm.yaml

With helm:

If you want to deploy huggingface models, specify an API token first and convert it to base64:
```
echo "YOUR_TOKEN" | tr -d "\n" | base64
```

Copy the command output and run helm:

helm install my-vllm . -f values.yaml  --set HF_API_TOKEN="YOUR_COMMAND_OUTPUT"

Sending Requests

Note: qwen-inf-serv endpoint is defined in VirtualService resource.

 curl https://<your host>/qwen-inf-serv/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-1.5B-Instruct",
        "guided_choice": ["positive", "negative"],
        "messages": [
            {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
        ]
    }' \

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
templates		templates
Chart.yaml		Chart.yaml
README.md		README.md
values.yaml		values.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-serving

Deploy the Model

Sending Requests

About

Releases

Packages

prokube/vllm-serving

Folders and files

Latest commit

History

Repository files navigation

vllm-serving

Deploy the Model

Sending Requests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages