The Core · Rob's Workshop

Hardware Infrastructure

Rig Progression Specs — Black & White Build

Last revision: 2026-03-02

Telemetry metrics and hardware components deployed on the primary workshop virtualization server.

Subsystem	Component	Notes
CPU	`Ryzen 9 7950X3D`	PBO -25 mV curve, 95°C limit
GPU	`RTX 4090 24 GB`	Reserved for LLM inference offloads
RAM	`96 GB DDR5-6000 CL30`	2x48 GB modules — EXPO profile 1
Storage (OS)	`2 TB NVMe Gen4`	System boot drive, fast hot-cache models
Storage (Cold)	`8 TB SATA`	GGUF local archives, image sets, model backups
PSU	`1200 W Platinum`	Single-rail delivery, fully modular topology

Local LLM Runbooks

Inference Memory Footprints

Quantization Matrix

VRAM Metrics

Practical VRAM footprints for common model sizes against quantization format. Field-validated; figures are approximate.

Model	Format	VRAM	Context
Llama-3 8B	`GGUF Q5_K_M`	~6.5 GB	8 K
Llama-3 70B	`GGUF Q4_K_M`	~42 GB	8 K
Qwen2 14B	`EXL2 5.0bpw`	~10 GB	32 K
Mixtral 8x7B	`GGUF Q4_K_M`	~26 GB	32 K

Local Server Launch

llama.cpp Headless Launch Setup

llama-server

Script configuration for launching a quantized model with extended context on the local GPU.

# Run a quantized model with extended context on local GPU
./llama-server \
  --model       ./models/Meta-Llama-3-70B-Instruct.Q4_K_M.gguf \
  --ctx-size    8192 \
  --n-gpu-layers 81 \
  --threads     16 \
  --port        8080 \
  --host        127.0.0.1 \
  --metrics

Proxmox VE virtualization

Hypervisor Notes & SR-IOV Configuration

Ve Specs

Workshop hypervisor runs Proxmox VE on the bench rig. GPU passthrough is reserved for the inference VM; build VMs use virtual SR-IOV.

# Proxmox - pass NVIDIA GPU to LLM VM
echo "options vfio-pci ids=10de:2684,10de:22ba" > /etc/modprobe.d/vfio.conf
update-initramfs -u
qm set 101 -hostpci0 01:00,pcie=1,x-vga=1