Take your Ollama server online
A secure gateway with Cloudflare Zero Tunnel, OAuth and Nginx
Introduction
If you’ve already experimented with Ollama locally on your home server, you’re now familiar with running Large Language Models (LLMs) within your private network. However, it’s time to take the next step: exposing your Ollama service securely online, allowing seamless access to your LLMs from anywhere in the world.
In this article, we’ll explore how to achieve this using Cloudflare Zero Tunnel and Nginx, leveraging Docker containers for an easy and containerized setup.
TL;DR
For developers who prefer to dive straight into the code, I’ve got you covered! The docker-based deployment solution is available on GitHub. Simply click this link to access the code repository and start protecting your LLM infrastructure today.
Requirements
- Ollama up and running at the default port 11434.
- Container Management: Podman or Docker
- Your own domain name.
- A valid Cloudflare Tunnel token.
If you don’t have Ollama, follow the official documentation for installation instructions.
If you don’t have a Cloudflare Tunnel token, follow the official documentation for installation instructions.
Ollama
Make sure your Ollama server is up and running by running this command:
1
curl http://localhost:11434
If you see a Ollama is running
message in your console, it indicates that the Ollama API is up and accessible. However, this also highlights a critical security concern: the lack of protection for our Ollama API.
Specifically, there are two risks to be aware of:
- No authentication required: Since no authentication is implemented, anyone with access to the server can directly access the Ollama API, posing a significant security risk.
- No TLS encryption: The Ollama API can be accessed via regular HTTP without any encryption, making it vulnerable to eavesdropping and tampering.
To mitigate the first risk, we will employ Nginx as a reverse proxy server, and introduce the need for an API key to access our server.
For the second concern we will rely on Cloudflare Zero Tunnel.
Containers
To simplify the installation process, we’ll use Docker containers to run both Cloudflare Tunnel and Nginx. This approach allows for a simple deployment of our application without requiring manual installations or management of underlying infrastructure.
Let’s have a look at our docker-compose file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
version: "3"
services:
nginx:
container_name: nginx
image: nginx:latest
volumes:
- ./templates:/etc/nginx/templates:ro
env_file: .env
restart: always
cloudflared:
container_name: cloudflared
image: cloudflare/cloudflared:latest
network_mode: service:nginx
command: tunnel --no-autoupdate run
env_file: .env
restart: always
depends_on:
- nginx
The configuration is simple. We’re using the official images of Nginx and Cloudflare, and we’re pointing them to read a .env
file for environment variable configurations.
Let’s check how each of these two services are configured.
Nginx
To enhance the security of our Ollama API, we will configure Nginx to validate incoming requests for Bearer Authorization tokens.
By providing a default.conf.template file as a starting point, we can add a custom configuration that validates the presence and authenticity of Bearer tokens in each request.
You can see this in action here:
1
2
3
if ($http_authorization != "Bearer ${OLLAMA_API_KEY}") {
return 401;
}
Now that we’ve configured Nginx to authenticate requests, we simply need to set the OLLAMA_API_KEY value securely.
We’ll achieve this by defining an environment variable in a .env
file, which will serve as our secure configuration source. By keeping sensitive information like API keys out of our codebase and instead storing it externally, we can maintain better security and scalability.
You can find an example file here.
The value for the OLLAMA_API_KEY can be any string you like. However, to ensure consistency, it is recommended to use a standardized format similar to OpenAI keys. This format consists of the prefix sd_ followed by 32 hexadecimal digits.
For example:
1
OLLAMA_API_KEY = "sd_<hexadecimal value>"
If you would like to generate one key via code, here’s a Python script you can run:
1
2
3
4
5
6
7
import hashlib
def generate_api_key():
secret = 'replace_me'
return f"sd_{hashlib.sha256(secret.encode()).hexdigest()[:32]}"
print(generate_api_key())
Cloudflared
We’re going to use Cloudflare Zero Tunnel to access our Nginx service from the internet.
If you’re unfamiliar with Cloudflare Tunnel, please read this announcement from Cloudflare.
For the purposes of this blog, we’re assuming you already have a domain name, and you’ve registered it in Cloudflare. Steps on how you can do this can be found here.
You should configure a public hostname, which will be used to connect to your Ollama API. In this example, we’re configuring a public hostname called https://llm.example.com that is redirecting to the service http://localhost:80.
In our docker-compose file, we configured Cloudflare to run in the same network as Nginx. As a result, when Cloudflare sends a request through the tunnel, it will look for localhost:80. Our Nginx service, running on this port, is responsible for handling the request.
Before forwarding the request to its final destination, Nginx checks the OAuth Bearer token associated with the incoming request. If the token is valid, Nginx will forward the request to Ollama API, ensuring that only authenticated requests reach the intended service.
Running your service
This closes the loop between Cloudflare, Nginx and Ollama API. Please make sure you have entered both the OLLAMA_API_KEY
and the TUNNEL_TOKEN
in the .env
file, and start up your docker instances.
1
podman compose up
Now you can give this a try by accessing your public hostname, sending the OLLAMA_API_KEY as your Authorization header.
1
2
curl -i https://llm.example.com \
-H "Authorization: Bearer YOUR_CUSTOM_OLLAMA_API_KEY"
Now you can try exploring different tools or services that allow access to LLMs. For example, if you want to run a ChatGPT like app in your Android device, you can run this OpenSource app called GPTMobile, that can connect to your Ollama instance.
References
GitHub Repository: https://github.com/eleiton/ollama-tunnel