gbrain: sync converted org-mode brain files
This commit is contained in:
79
projects/server-rack-build.org
Normal file
79
projects/server-rack-build.org
Normal file
@@ -0,0 +1,79 @@
|
||||
:PROPERTIES:
|
||||
:CREATED: [2026-05-31 Sun]
|
||||
:ID: a1b2c3d4-e5f6-7a8b-9c0d-1e2f3a4b5c6d
|
||||
:END:
|
||||
#+title: Server Rack Build — Working Note
|
||||
#+filetags: :infrastructure:rack:build:
|
||||
#+STATUS: draft
|
||||
|
||||
* Overview
|
||||
|
||||
Building out a 10-20U open rack, server-grade components bought individually over months. This is the first racked node — triple duty as Passepartout host, Proxmox home server, and ZFS array. Node-1 (Protectli, i7, 6 NICs) stays as network edge.
|
||||
|
||||
Already have 10Gb networking, that's stable.
|
||||
|
||||
* Current topology
|
||||
|
||||
- **Node-1 (Protectli)**: Small form factor, i7, 6 NICs, no PCIe, no GPU, limited RAM. Network appliance / router.
|
||||
- **Node-2 (racked)**: First rack server. Passepartout + Proxmox + ZFS + GPU for local Hermes inference.
|
||||
|
||||
* Chassis
|
||||
|
||||
- 3U or 4U rackmount
|
||||
- Room for full-height GPU, hot-swap drive bays, sufficient airflow
|
||||
- Open rack design, 10-20U growable
|
||||
|
||||
* Platform decision (TBD)
|
||||
|
||||
| Option | Pros | Cons |
|
||||
|--------|------|------|
|
||||
| Intel Xeon 6 (Granite Rapids) | Newest arch, 12-ch DDR5, 136 PCIe 5.0 lanes, AMX AI accelerators | LGA 4710 (new socket, new mobo cost), DDR5 only, expensive |
|
||||
| AMD EPYC 7002 (Rome) | 128 PCIe 4.0 lanes, 8-ch DDR4, cheap on used market | Older gen, DDR4 (slower, but cheap), no AMX |
|
||||
| AMD EPYC 9004/9005 (Genoa/Turin) | 160 PCIe 5.0 lanes, 12-ch DDR5 | More expensive than 7002, but current gen |
|
||||
|
||||
* GPU decision (TBD)
|
||||
|
||||
Local inference for Hermes. Candidates:
|
||||
|
||||
| Option | VRAM | Price | Notes |
|
||||
|--------|------|-------|-------|
|
||||
| Intel Arc Pro B70 | 32 GB GDDR6 | ~$949 MSRP | Battlemage workstation, air-cooled, 230W, PCIe 5.0 x16. Plug-and-play with standard toolchains. |
|
||||
| Tenstorrent P150 (Blackhole) | 32 GB GDDR6 | ~$1,399 | RISC-V Tensix, open source stack, 300W. Software less mature, needs tt-forge compilation. 4x QSFP-DD for linking cards. |
|
||||
| RTX 5090 | 32 GB GDDR7 | ~$2,000 | CUDA, best software ecosystem. Consumer card, may need blower mod for rack. |
|
||||
| RTX 6000 Ada (used) | 48 GB GDDR6 | ~$4-5K used | More VRAM, enterprise. Higher price even used. |
|
||||
|
||||
Key consideration on P150: not CUDA, not a GPU in the conventional sense. Software maturity is the main cost, not the hardware price.
|
||||
|
||||
* Memory plan
|
||||
|
||||
Start with 2×64GB DDR5 ECC RDIMM, grow to 4×64GB → 8×64GB (full 512GB on 8-channel; or 384GB on 12-channel).
|
||||
|
||||
Tradeoff: running fewer DIMMs than full channel count reduces memory bandwidth proportionally. 2 DIMMs on 8-channel = 25% bandwidth. First to suffer: ZFS ARC performance, VM responsiveness. Compute (LLM inference) is fine since GPU has own VRAM.
|
||||
|
||||
Alternative: start with 4×64GB to get half bandwidth without crippling storage I/O, then grow to 8×64GB.
|
||||
|
||||
* Build order (over months)
|
||||
|
||||
1. Rack + chassis + PSU
|
||||
2. Motherboard + CPU + RAM + boot drives (runs Proxmox + ZFS immediately)
|
||||
3. HDDs for ZFS array (start with 2, grow)
|
||||
4. GPU (last piece — when inference workload justifies it)
|
||||
|
||||
* Questions still open
|
||||
|
||||
- Intel Xeon 6 vs AMD EPYC (which gen)?
|
||||
- DDR4 (EPYC 7002) vs DDR5 (everything else)?
|
||||
- GPU: Intel Arc Pro B70 vs Tenstorrent P150 vs RTX 5090?
|
||||
- Start with 2×64GB or 4×64GB on memory?
|
||||
- Water cooling for CPU (Xeon 6 TDP may need it) or just air?
|
||||
- Specific rack model / chassis model?
|
||||
|
||||
* Strategic framing
|
||||
|
||||
This node is a bootstrap between Stage 0 (current, conventional) and Stages 3-4 (Lisp machine, bare-metal, in-process LLM on dedicated silicon). DDR4's bandwidth ceiling won't matter because:
|
||||
|
||||
- Proxmox + ZFS + the Gate (Stage 2) don't stress 8-channel DDR4-3200
|
||||
- GPU inference uses its own VRAM, not system memory
|
||||
- By the time the Lisp machine arrives (different hardware entirely), this node graduates to NAS / Proxmox host duty
|
||||
|
||||
Part availability risk is acceptable — at 7+ years of life, the build has already paid for itself many times over, and a motherboard failure means re-platforming onto whatever is current, not trying to resurrect DDR4 infrastructure.
|
||||
Reference in New Issue
Block a user