Cluster II Spec

The Core

Last updated 2026-06-27

Hardware Infrastructure

Rig Progression Specs — Black & White Build

Last revision: 2026-03-02

Telemetry metrics and hardware components deployed on the primary workshop virtualization server.

Subsystem Component Notes
CPU Ryzen 9 7950X3D PBO -25 mV curve, 95°C limit
GPU RTX 4090 24 GB Reserved for LLM inference offloads
RAM 96 GB DDR5-6000 CL30 2x48 GB modules — EXPO profile 1
Storage (OS) 2 TB NVMe Gen4 System boot drive, fast hot-cache models
Storage (Cold) 8 TB SATA GGUF local archives, image sets, model backups
PSU 1200 W Platinum Single-rail delivery, fully modular topology

Local LLM Runbooks

Inference Memory Footprints

Quantization Matrix

VRAM Metrics

Practical VRAM footprints for common model sizes against quantization format. Field-validated; figures are approximate.

Model Format VRAM Context
Llama-3 8B GGUF Q5_K_M ~6.5 GB 8 K
Llama-3 70B GGUF Q4_K_M ~42 GB 8 K
Qwen2 14B EXL2 5.0bpw ~10 GB 32 K
Mixtral 8x7B GGUF Q4_K_M ~26 GB 32 K

Local Server Launch

llama.cpp Headless Launch Setup

llama-server

Script configuration for launching a quantized model with extended context on the local GPU.

# Run a quantized model with extended context on local GPU
./llama-server \
  --model       ./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf \
  --ctx-size    8192 \
  --n-gpu-layers 81 \
  --threads     16 \
  --port        8080 \
  --host        127.0.0.1 \
  --metrics

Proxmox VE virtualization

Hypervisor Notes & SR-IOV Configuration

Ve Specs

Workshop hypervisor runs Proxmox VE on the bench rig. GPU passthrough is reserved for the inference VM; build VMs use virtual SR-IOV.

# Proxmox - pass NVIDIA GPU to LLM VM
echo "options vfio-pci ids=10de:2684,10de:22ba" > /etc/modprobe.d/vfio.conf
update-initramfs -u
qm set 101 -hostpci0 01:00,pcie=1,x-vga=1