Deploy Your First LLM

This guide walks you through deploying a fully self-hosted large language model on GPCN™ infrastructure — from provisioning your virtual machines to chatting with your model in a browser. You'll set up Ollama to serve the model and Open WebUI to give your team a familiar, ChatGPT-style interface to interact with it.

By the end of this guide you'll have two VMs running in your GPCN™ account: one serving the LLM over a private API, and one hosting a web interface your users can access directly. Everything runs in your own environment — no third-party AI service, no data leaving your infrastructure.

Architecture

You'll create two virtual machines:

  • Ollama Server — a VM that hosts and serves the LLM. A GPU VM is recommended for production; a standard VM works for prototyping. Accessed only by the Web UI VM.
  • Web UI VM — a standard VM running the Open WebUI container. This is the VM your users connect to.

Both VMs are reachable from the internet via their public IPs. Security is enforced by OS-level firewall rules (UFW) inside each VM — GPCN™ does not provide platform-level security groups or port filters.

Prerequisites

  • A GPCN™ account — if you plan to use a GPU VM for the Ollama server, GPU access must be enabled on your account (contact support if you don't see the GPU section in the sidebar)
  • The Tenant Administrator or Tenant Power User role
  • An SSH key in your SSH Keys library
  • An SSH client:
    • Windows: Windows Terminal (built-in OpenSSH) or PuTTY
    • Mac / Linux: SSH is pre-installed
  • A modern web browser (Chrome, Firefox, Edge, or Safari) to access the Open WebUI interface

Phase 1 — Prepare Your SSH Key

This guide uses SSH keys to access both virtual machines. Before creating any VMs, make sure you have a key ready in your GPCN™ SSH Keys library.

Navigate to SSH Keys in the sidebar. If you already have a key listed, you're set — note the name so you can select it during VM creation.

If you don't have a key yet:

  1. Click Add SSH Key
  2. Choose Generate to have GPCN™ create a key pair for you
  3. Enter a name (e.g., llm-deploy-key), select ED25519, and click Generate Key
  4. When the private key is shown, copy or download it immediately — GPCN™ does not store private keys and cannot recover them

Save your private key to a safe location before closing the modal. You'll need it to SSH into your VMs. The downloaded file will be named after the key you created (e.g., llm-deploy-key.pem) and will typically land in your Downloads folder. Note the full path — you'll substitute it for C:\path\to\your\private-key on Windows or /path/to/your/private-key on Mac/Linux throughout this guide.

See SSH Keys for full details.


Phase 2 — Create the Ollama Server VM

You'll provision the VM that will run the Ollama service — this is where your LLM models will live and respond to requests. By the end of this phase you'll have a running VM with a public IP address you'll use throughout the remaining steps.

Choose the approach that fits your use case:

  • GPU VM — recommended for production. Significantly faster inference, required for larger models.
  • Standard VM — suitable for prototyping and smaller models. Requires at least 8 vCPUs and 32 GB RAM.

Option A — GPU VM

Navigate to GPU in the sidebar and click Create GPU VM.

Step 1 — GPU Series

Click a series card to expand it and choose a GPU count. For LLM inference:

Series Recommended for
RTX A6000 Experimentation and smaller models (up to ~13B parameters)
A100 Production inference, larger models, good price-to-performance
H100 Largest models and highest-throughput inference

Select 1 GPU to start. You can always create a new VM with more GPUs if needed.

Step 2 — Deployment Location

Choose the data center closest to your users. Only locations with available inventory for your selected series are shown.

Step 3 — Operating System

Select Ubuntu 22.04 LTS or Ubuntu 24.04 LTS. Both images come with NVIDIA drivers pre-installed.

Step 4 — Authentication

Select your SSH key from the dropdown. If you set a default key, it's pre-selected automatically.

Step 5 — Name Your VM

Enter a name such as ollama-server.

Step 6 — Resource Group (Optional)

Assign to a resource group if you're using them for organization, or leave the default.

Step 7 — Deploy

Review the configuration summary and click Deploy GPU VM, then confirm.

Wait for the status to change to Running — this typically takes 2–5 minutes.

Get the GPU VM IP

Once the VM is running, open its detail page. GPU VMs are assigned a public IP automatically — copy this IP and save it. You'll need it in several later steps.

Save this IP now, along with the username (ubuntu by default). You'll reference it frequently.


Option B — Standard VM

Navigate to Virtual Machines in the sidebar and click Create VM.

Step 1 — Location

Choose the data center closest to your users.

Step 2 — VM Size

Select M-Medium-1 (8 vCPU, 32 GB RAM) as a minimum. For larger models or better performance, choose M-Large-1 (16 vCPU, 64 GB RAM). Ollama loads model weights into memory, so RAM is the primary constraint.

Step 3 — Choose an OS

Select Ubuntu 22.04 LTS or Ubuntu 24.04 LTS.

Step 4 — SSH Authentication

Select your SSH key from the dropdown. The login username will be ubuntu by default.

Step 5 — VM Name

Enter a name such as ollama-server.

Step 6 — Assign to Resource Group (Optional)

Assign to a resource group or leave the default.

Step 7 — Network & Public IP

Expand the Additional Configuration section and enable Allocate Public IP — required so the Web UI VM can reach the Ollama server.

Step 8 — Deploy

Click Deploy Virtual Machine and wait for the status to reach Running (1–3 minutes).

Get the Standard VM IP

Open the VM detail page and copy the public IP address. Save it along with the username (ubuntu by default).

Save this IP now — you'll reference it frequently throughout the remaining phases.


Phase 3 — Configure the Ollama Server

With your Ollama server VM running, you'll connect to it over SSH and set up the Ollama service and your chosen LLM models. By the end of this phase, Ollama will be installed, at least one model will be downloaded, and the API will be running and ready to accept requests from the Web UI.

All commands in this phase are run inside your Ollama Server VM over SSH.

Connect to the Ollama Server via SSH

Windows (PowerShell or Windows Terminal):

ssh -i C:\path\to\your\private-key ubuntu@YOUR_OLLAMA_IP

Mac / Linux:

# Set correct permissions on your private key (required — SSH rejects world-readable keys)
chmod 600 /path/to/your/private-key

ssh -i /path/to/your/private-key ubuntu@YOUR_OLLAMA_IP

When asked about host authenticity, type yes and press Enter.

If you see "Permission denied":

  • Check the key path is correct
  • On Windows, ensure no other users have access to the key file (right-click → Properties → Security)
  • Verify you're using ubuntu as the username

Update System Packages

Before installing anything, update the VM's package index and apply any pending upgrades. Fresh VMs often have queued security patches, and starting from a fully updated system avoids dependency conflicts during installation.

# Wait for any automatic updates to finish
while sudo fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do
  echo "Waiting for package manager..."
  sleep 5
done

sudo apt-get update
sudo apt-get upgrade -y

Install Ollama

Ollama is the engine that downloads, manages, and serves your LLM models. It runs as a system service and exposes a local API that the Web UI will use to send prompts and receive responses. The installer handles everything — the service starts automatically and will restart on reboot.

curl -fsSL https://ollama.com/install.sh | sh

# Verify Ollama is running
sudo systemctl status ollama

# Confirm the API is responding
curl http://localhost:11434/api/tags

Download LLM Models

Ollama downloads models directly from the Ollama model library and stores them on the VM's disk. Each model is a large file containing the neural network weights that define how the model thinks and responds — the bigger the model, the more capable it tends to be, but the more disk space, memory, and compute it requires to run. Download at least one model to get started. Depending on the model size, this can take anywhere from a few minutes to over 30 minutes.

# Recommended starter model — fast and capable
ollama pull llama3.2

# Optional additional models
ollama pull mistral
ollama pull codellama

# Confirm models are installed
ollama list

Model size reference:

Model Minimum RAM Disk Space
llama3.2 (3B) 4 GB 2 GB
mistral (7B) 8 GB 4 GB
llama2 (13B) 16 GB 8 GB
codellama (34B) 32 GB 20 GB

Test Ollama

Before moving on, send a test request to confirm the Ollama API is responding and that your model loads and generates output correctly. This catches any installation or memory issues now rather than after the Web UI is deployed.

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello!",
  "stream": false
}'

If you downloaded a different model, replace llama3.2 in the command above with the name from your ollama list output.

You should see a JSON response with generated text. If so, Ollama is working correctly.

Configure the Ollama Server Firewall

You'll use UFW (Uncomplicated Firewall) to control which connections are allowed into this VM. At this stage you'll open SSH so you don't lose access, and temporarily open the Ollama API port to any source. You'll lock that port down to only the Web UI VM's IP in Phase 6, once that VM exists and you have its IP address.

sudo apt-get install -y ufw

# Always allow SSH first — otherwise you'll lock yourself out
sudo ufw allow 22/tcp

# Temporarily allow Ollama from anywhere
# You'll restrict this to the Web UI IP after creating that VM
sudo ufw allow 11434/tcp

sudo ufw --force enable
sudo ufw status

Keep this SSH session open — you'll return to it in Phase 6 to update the firewall rule. If you close it, you'll need to reconnect before continuing.


Phase 4 — Create the Web UI VM

With the Ollama server configured, you'll now provision the second VM that will host the Open WebUI interface. This is the VM your users will interact with — it runs a lightweight web container that connects to the Ollama server in the background. By the end of this phase you'll have a second running VM with its own public IP address.

Open the GPCN™ portal in a new browser tab and keep your Ollama SSH session open — you'll need it again in Phase 6.

Navigate to Virtual Machines in the sidebar and click Create VM.

Configure the Web UI VM

Step 1 — Select a Location

Choose the same data center where you deployed your Ollama server, or the closest one available. Keeping both VMs in the same location minimizes the latency between them.

Step 2 — Select a VM Size

Select G-Small-1 (4 vCPU, 8 GB RAM).

The Web UI VM doesn't run the model — it just serves the browser interface and forwards requests to the Ollama server. A small VM is sufficient even with multiple users.

Step 3 — Select an Operating System

Select Ubuntu 22.04 LTS or Ubuntu 24.04 LTS.

Step 4 — Configure Authentication

Select the same SSH key you used for the Ollama VM. The login username will be ubuntu by default — you can leave it as-is.

Step 5 — Name the Web UI VM

Enter a name such as ollama-webui.

Step 6 — Assign a Resource Group (Optional)

Assign to a resource group or leave the default.

Step 7 — Additional Configuration

Expand the Additional Configuration section and enable Allocate Public IP — this is required so users can reach the Web UI from their browsers.

Step 8 — Review & Deploy

Click Deploy Virtual Machine and wait for the status to reach Running (1–3 minutes).

Get the Web UI VM IP

Open the VM detail page and copy the public IP address. Save it — this is the address your users will visit to access the Web UI, and you'll use it to lock down the Ollama firewall.


Phase 5 — Configure the Web UI VM

With both VMs running, you'll connect to the Web UI VM and install Docker and the Open WebUI container. The container is pre-configured to connect to your Ollama server, so once it's running, the two services will be linked. By the end of this phase the web interface will be up and accessible from a browser.

All commands in this phase are run inside your Web UI VM over SSH. Open a new terminal window — keep the Ollama SSH session open.

Connect to the Web UI VM via SSH

Windows:

ssh -i C:\path\to\your\private-key ubuntu@YOUR_WEBUI_IP

Mac / Linux:

ssh -i /path/to/your/private-key ubuntu@YOUR_WEBUI_IP

Update Packages

Just like on the Ollama server, start by updating the package index and applying any pending upgrades before installing anything else.

sudo apt-get update
sudo apt-get upgrade -y

Install Docker

Open WebUI is distributed as a Docker container, so Docker needs to be installed before you can run it. The commands below add Docker's official package repository and install the engine — this ensures you get the current stable release rather than the older version bundled with Ubuntu's default repositories.

sudo apt-get install -y ca-certificates curl gnupg lsb-release

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

sudo docker --version

Deploy the Open WebUI Container

This command pulls the Open WebUI image and starts it as a background container. It tells the container where to find your Ollama server via the OLLAMA_BASE_URL environment variable, maps port 8080 so the interface is reachable from a browser, and mounts a persistent volume so that user accounts and chat history survive container restarts and updates. Replace YOUR_OLLAMA_IP with the public IP of your Ollama server VM.

sudo docker run -d \
  --name open-webui \
  -p 8080:8080 \
  -e OLLAMA_BASE_URL=http://YOUR_OLLAMA_IP:11434 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

The container image download takes 2–3 minutes. Verify it started successfully:

# Should show "Up" status
sudo docker ps

# Should show "Application startup complete" near the end
sudo docker logs open-webui

Configure the Web UI Firewall

Apply the same UFW setup to this VM. You'll keep SSH accessible and open port 8080 so users can reach the Open WebUI interface from their browsers.

sudo apt-get install -y ufw

sudo ufw allow 22/tcp
sudo ufw allow 8080/tcp

sudo ufw --force enable
sudo ufw status

Phase 6 — Restrict Ollama Access

At this point the Ollama API port is open to any IP address — a temporary state from the initial firewall setup in Phase 3. Now that you have the Web UI VM's public IP, you'll update the firewall rule to allow only that VM to reach the Ollama server. This keeps the model API off the public internet while still letting the Web UI communicate with it.

Switch back to your Ollama Server SSH window. If you closed it, reconnect:

ssh -i /path/to/your/private-key ubuntu@YOUR_OLLAMA_IP

Run these commands, replacing YOUR_WEBUI_IP with the Web UI VM's public IP:

# Remove the allow-all rule
sudo ufw delete allow 11434/tcp

# Allow Ollama only from the Web UI VM
sudo ufw allow from YOUR_WEBUI_IP to any port 11434 proto tcp

sudo ufw status numbered

You should see port 11434 listed with the Web UI IP as the only allowed source.


Phase 7 — Verify and Access

With both VMs configured and the firewall locked down, you'll confirm that the Web UI can reach the Ollama server and then open the interface in your browser for the first time. By the end of this phase your LLM deployment will be live and ready to use.

Test the Connection

From your Web UI VM SSH session, confirm it can reach Ollama:

curl http://YOUR_OLLAMA_IP:11434/api/tags

You should see a JSON list of your installed models. If you get an error:

  • Confirm the Ollama IP is correct
  • Check Ollama is running: sudo systemctl status ollama (on the Ollama server)
  • Verify the firewall rule includes the correct Web UI IP: sudo ufw status (on the Ollama server)

Access the Web UI

Open your browser and go to:

http://YOUR_WEBUI_IP:8080

First-time setup:

  1. Click Sign up to create your admin account — the first account created becomes the administrator
  2. Enter your email, name, and a strong password
  3. Click Create Account

Once logged in, click the model dropdown at the top, select llama3.2 (or another model you downloaded), type a message, and press Enter. You now have a privately hosted LLM.

Save your Web UI URL, admin credentials, and both VM IPs in a secure location. You'll need them for ongoing management.


Congratulations

You've successfully deployed a fully self-hosted large language model on GPCN™ infrastructure. You now have:

  • An Ollama server running one or more LLM models, accessible only to your Web UI
  • An Open WebUI interface your team can use from any browser, with user accounts, chat history, and model selection built in
  • A secured network configuration that keeps the model API off the public internet

Everything is running in your own environment — your prompts, your data, and your models stay entirely within your GPCN™ account. From here you can invite team members, download additional models, or scale up your infrastructure as your needs grow.


Managing Your Deployment

Starting and Stopping VMs

To reduce costs, stop VMs when not in use:

  1. In the GPCN™ portal, navigate to the VM's detail page
  2. Click Stop and confirm

To restart, click Start and wait for the status to return to Running.

A stopped VM may be assigned a different public IP when restarted. Check the VM detail page for the current IP, and update the Web UI container's OLLAMA_BASE_URL if the Ollama server IP changes (see Updating the Ollama IP below).

Adding Models

SSH to the Ollama server and run:

ollama pull mistral
ollama pull codellama
ollama list

Refresh the Web UI in your browser — new models appear in the model dropdown automatically.

Removing Models

To free up disk space, remove models you no longer need. List your installed models first to confirm the exact name, then remove it.

ollama list
ollama rm MODEL_NAME

Updating the Ollama IP

If the Ollama server's IP changes after a restart, update the Web UI container:

# On the Web UI VM
sudo docker stop open-webui
sudo docker rm open-webui

sudo docker run -d \
  --name open-webui \
  -p 8080:8080 \
  -e OLLAMA_BASE_URL=http://NEW_OLLAMA_IP:11434 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Your users, chat history, and settings are preserved in the volume.

Updating Open WebUI

Pull the latest image and recreate the container to apply the update. Your data is stored in a persistent volume and is not affected by this process.

# On the Web UI VM
sudo docker pull ghcr.io/open-webui/open-webui:main
sudo docker stop open-webui
sudo docker rm open-webui

sudo docker run -d \
  --name open-webui \
  -p 8080:8080 \
  -e OLLAMA_BASE_URL=http://YOUR_OLLAMA_IP:11434 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Troubleshooting

Can't SSH to VM

"Permission denied"

  • Verify the key file path is correct
  • On Mac/Linux: run chmod 600 /path/to/key
  • On Windows: check that only your user account has access to the key file
  • Confirm you're using ubuntu as the username

"Connection refused" or timeout

  • Verify the VM status is Running in the GPCN™ portal
  • Confirm you're using the correct public IP from the VM detail page

Ollama Issues

Models take a long time to download

This is expected. Large models can take 20–30 minutes. Monitor progress with ollama list — partially downloaded models are shown. Check available disk space with df -h.

Ollama service won't start

Check the service logs to identify the error, then attempt a restart and confirm the current status.

sudo journalctl -u ollama -n 50
sudo systemctl restart ollama
sudo systemctl status ollama

Model loads but runs out of memory

The model requires more memory than your VM has available — VRAM on a GPU VM, or system RAM on a standard VM. Try a smaller model (llama3.2 at 3B parameters), or upgrade to a larger VM or GPU series with more memory.

Web UI Issues

Browser shows "connection refused" or times out

  1. Confirm the Web UI VM is Running in the portal
  2. Verify you're using http:// (not https://)
  3. Confirm the port is :8080
  4. Check the container is running: sudo docker ps
  5. Check port 8080 is open: sudo ufw status

Web UI loads but shows "Connection error" or no models

SSH to the Web UI VM and test the Ollama connection directly:

curl http://YOUR_OLLAMA_IP:11434/api/tags

If this fails:

  • Verify Ollama is running on the Ollama server (sudo systemctl status ollama)
  • Check the UFW rule on the Ollama server includes the correct Web UI IP (sudo ufw status)
  • Check the container logs: sudo docker logs open-webui

Responses are very slow

CPU-only inference is slow by design — this is why a GPU VM is recommended. If you're on a GPU VM, check GPU utilization with nvidia-smi. A smaller model like llama3.2 (3B) will be significantly faster than larger models.


Next Steps

  • Add more users — go to the Web UI Admin Panel → Users and create accounts for your team
  • Enable HTTPS — set up a reverse proxy (such as Caddy or nginx) in front of port 8080 for secure remote access
  • Explore the Ollama library — visit ollama.com/library for all available models
  • Review GPU docsGPU Overview and Manage GPU VMs for lifecycle management