Automatically configure and obtain a URL for Ollama on any remote Linux server (GPU Providers, Google Colab, Kaggle, etc.) using the Cloudflared tunnel..
Note: For Google Colab, here is an example notebook. This is only allowed if you are a paid colab user as per their terms of service. If you use it for free account, do it at your own risk.
This is useful for faster experimentation if the ollama model run too slow locally, and for synthetic data generation in large batches.
Install the package via pip
.
pip install ollama-remote
Then, just run ollama-remote
on the remote server and it will give you back the URL.
ollama-remote
You will get back the commands to copy and run locally.
Once you set OLLAMA_HOST
to the assigned URL, you can run any ollama commands on your local terminal. It will feel like working locally, but the actual model inference happens on the server side. Make sure you have ollama
CLI installed locally.
export OLLAMA_HOST='https://spa-visiting-voices-omissions.trycloudflare.com'
ollama run phi3:mini --verbose
If the server has GPUs such as Colab, this will be much faster.
The commands are same as regular ollama and you can download any models that fits on the GPU server-side.
ollama pull phi3:mini
ollama run phi3:mini
You are also provided code to use the model through the OpenAI SDK. Make sure to pull the model specified in the code beforehand via ollama pull phi3:mini
.
from openai import OpenAI
client = OpenAI(
base_url="https://spa-visiting-voices-omissions.trycloudflare.com/v1/",
api_key="ollama",
)
response = client.chat.completions.create(
model="phi3:mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
)
print(response.choices[0].message.content)
