Files
memex/projects/infrastructure/docs/qubes-npu-setup.org
Amr Gharbeia cc6c552d5a refactor: Move Emacs config from system/ to projects/dotemacs/
- Delete deprecated system/ configuration files
- Update projects/dotemacs/modules/ with reorganized config
- Add .opencode/ directory for agent state
- Clean up attachments and unused documentation files
2026-04-25 18:41:20 -04:00

2.2 KiB

Qubes NPU Setup - sys-ai

Documentation for setting up sys-ai Qube with AMD Ryzen AI NPU for llama.cpp.

Hardware

  • Laptop: Framework Laptop 13 (AMD)
  • CPU: AMD Ryzen AI 5 340 (6 cores, no SMT)
  • NPU: AMD XDNA2 (Strix Point) - c2:00.1 / dom0:00_08.2-00_00.1
  • RAM: 96GB total

Current Progress

DONE [X] Create sys-ai AppVM (HVM, 64GB, 2 vCPUs)

# Run in dom0
qvm-create --label purple --property netvm=sys-firewall --property memory=65536 --property vcpus=2 --property virt_mode=HVM sys-ai

DONE [X] Attach NPU PCI device to sys-ai

# Run in dom0
qvm-pci attach -o no-strict-reset=true sys-ai dom0:00_08.2-00_00.1 --persistent

TODO [ ] Fix repository configuration in sys-ai

Status: Package repositories missing in fedora-43-ai template. Fedora 43 uses DNF5 with different repo paths.

Next Step: Qubes OS templates typically get packages installed via `qubes-vm-update` or dom0 commands. Try the Qubes way to install packages.

TODO [ ] Verify NPU is accessible inside sys-ai

# Install pciutils
sudo dnf install pciutils

# Check NPU is visible
lspci | grep -i neural

TODO [ ] Install AMD NPU drivers in sys-ai

# Enable Copr repository
sudo dnf copr enable xanderlent/amd-npu-driver

# Install drivers
sudo dnf install xrt xdna-driver tcsh

# Setup environment
source /usr/xrt/setup.sh

# Verify NPU detection
xrt-smi examine

TODO [ ] Build llama.cpp with AMD XDNA2 NPU backend

# Install build dependencies
sudo dnf install cmake gcc-c++ python3.11 git

# Clone NPU fork
git clone https://github.com/BrandedTamarasu-glitch/OllamaAMDNPU.git
cd OllamaAMDNPU

# Build with NPU backend
cmake -B build -DGGML_XDNA=ON -DGGML_BACKEND_DL=ON -DBUILD_SHARED_LIBS=ON
cmake --build build --parallel

TODO [ ] Download model and test inference

# Download GGUF model (Qwen3 1B or 3B quantized)
# ... model download command ...

# Run with NPU offload
./build/bin/llama-cli -m model.gguf -p "Hello" -n 256 --npu-split 1

Next Step

Run the repository fix commands from the "Fix repository" step above.