Skip to content

prokube/vllm-serving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vllm-serving

This example demonstrates how to deploy a vLLM model in kubernetes cluster.

Deploy the Model

You would need a valid kube config and a KUBECONFIG env var set.
The deployment can then be initiated from this directory as follows:

kubectl apply -f examples/qwen-vllm.yaml

With helm:

  1. If you want to deploy huggingface models, specify an API token first and convert it to base64:
    echo "YOUR_TOKEN" | tr -d "\n" | base64
  2. Copy the command output and run helm:
    helm install my-vllm . -f values.yaml  --set HF_API_TOKEN="YOUR_COMMAND_OUTPUT"

Sending Requests

Note: qwen-inf-serv endpoint is defined in VirtualService resource.

 curl https://<your host>/qwen-inf-serv/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-1.5B-Instruct",
        "guided_choice": ["positive", "negative"],
        "messages": [
            {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
        ]
    }' \

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published