Flaviu Vlaicuwhois

Cybersecurity | DevOps | HomeLab | HomeAutomation

Hermes Agent Operator's Manual

The Operator’s Manual for Hermes Agent Building an AI assistant that can act, remember, and improve Operator’s Manual · Edition 3.2 · Verified against official Nous Research documentation About This Manual This manual explains how to deploy and operate Hermes Agent as a persistent “operator” — an AI system that runs continuously, uses tools, remembers context across sessions, and improves over time — rather than as a single-session chatbot. It covers architecture, installation, the core mental model, day-to-day workflows, the operator loop, common failure modes, advanced configuration (including offline skill optimization with GEPA), and a distilled set of operational lessons. ...

May 24, 2026 ·  47 min

The Modern Ubuntu Bash Terminal Setup

This is a long-form, opinionated guide to setting up a terminal that’s both pretty (syntax-highlighted, themed, autosuggesting) and productive (fuzzy everything, smart history, per-project Python envs, modern replacements for the classic Unix tools). It’s everything I wish I’d known before assembling the stack — including the half-dozen subtle ordering and key-binding issues that ate a couple of evenings of my life. Target: Ubuntu 22.04 or newer, with bash as your shell. No zsh, no fish — bash all the way. The reason: it’s the default, it’s everywhere, and with ble.sh it gets ~95% of zsh’s quality-of-life features. ...

June 19, 2026 ·  21 min · 
TL;DR
  • Turn Ubuntu’s default bash into a modern terminal: ble.sh for syntax highlighting + autosuggestions, Oh My Posh for the prompt, fzf everywhere.
  • Catppuccin Frappé across the whole stack — terminal, prompt, ble.sh syntax colors, fzf — with a single .blerc + Oh My Posh JSON.
  • Modern replacements for the classics: bat, eza, fd, rg, zoxide, dust, duf, procs, glow, doggo, trip (as mtr).
  • Atuin replaces ~/.bash_history with a context-rich SQLite store on Ctrl-R; up-arrow stays normal.
  • TUIs for the rest of life: yazi (files, with y to cd on quit), lazygit, lazydocker, ncdu, btop.
  • Auto-activating Python venvs with uv + direnv and a uvenv one-shot alias.
  • Layer order matters: ble.sh source first, atuin before Oh My Posh, direnv before ble-attach, and ble-attach absolutely last.
  • Owns vs steals: don’t load tv init bash — it hijacks both Ctrl-T from fzf and Ctrl-R from atuin.
  • Includes the full .bashrc, .blerc, and ten gotchas (the prompt-disappears-after-reboot one, the silent direnv one, the errexit vs exit naming trap).

LLM Quantization

Quantization is the single most important technique for running large language models outside a datacenter. It is what turns a model that needs eight enterprise GPUs into one that runs on a gaming card, a laptop, or a Mac mini. But the moment you go to download a model, you are confronted with an intimidating wall of cryptic names — Q4_K_M, IQ3_XXS, UD-Q5_K_XL, GPTQ-Int4, AWQ, NF4, EXL3, NVFP4 — with little explanation of what they mean or which one you should pick. ...

June 18, 2026 ·  27 min · 
TL;DR
  • Bits per weight is the master variable: it sets file size and is the main predictor of quality. Every method just spends a fixed bit budget well.
  • Quality has a knee around 4–5 bits — it collapses below and barely moves above. A good 4-bit quant is the default; 8-bit+ is usually wasted memory.
  • Which method you use (GGUF, GPTQ, AWQ, NF4, EXL3, FP8/FP4) is dictated by your runtime and hardware, not a universal best. Match the format to what your stack accelerates.
  • At a fixed memory budget, a bigger model quantized harder beats a smaller one quantized lightly. Go below 4-bit only with the methods built for it.
  • The KV cache is a separate knob, and for long contexts it can outweigh the weights. Quantize it too.
  • Perplexity hides the damage: quantization hits reasoning and code hardest. Judge a quant on tasks like your real workload, not one number.

DGX Spark + LlamaCPP Playbook

Complete Setup & Operations Guide Everything needed to build, run, update, and operate local LLMs on an NVIDIA DGX Spark (GB10 / sm_121) with llama.cpp and the llm helper command. 1. How the pieces fit The Spark (GB10). Blackwell GPU at compute capability 12.1 (sm_121), 128 GB unified LPDDR5x shared between CPU and GPU, ~273 GB/s memory bandwidth. Bandwidth is the bottleneck for token generation, so Mixture-of-Experts (MoE) models with few active parameters run far faster than dense models of the same total size. Prefer MoE. ...

June 17, 2026 ·  28 min · 
TL;DR
  • Build llama.cpp for the GB10 (sm_121) with LLAMA_OPENSSL=ON and the 121a native-FP4 target.
  • Serve any GGUF model over an OpenAI-compatible API with one command: llm run <model> [port].
  • All the Spark tuning is baked in — --no-mmap, flash-attention, q8_0 KV cache, batch 2048, 20 threads.
  • 121a adds native FP4 (MXFP4/NVFP4) speedups; it’s neutral on standard quants like Q8_0 and Q4_K_M.
  • Prefer MoE models: the Spark is memory-bandwidth-bound, so low active-parameter models run fastest.
  • Manage everything with the llm helper: run, stop, ps, ls, wait, test, speed, log, update.
  • Wire Hermes or Open WebUI to http://:/v1; runnable = GGUF + supported arch + ≤ ~200B.
  • Includes the full llm script, a cheatsheet, and a troubleshooting table.

Minisforum A2

I bought a Minisforum MS-A2, lived with it for months, modified most of it, pushed it harder than most people will, and then sold it. This review is the long answer to why, and it isn’t a clean recommendation either way. The MS-A2 is one of the most impressive small machines you can buy. It’s also one I’d never put on my desk or in my living room. I’ll explain how both of those are true. ...

June 12, 2026 ·  23 min

Q-feeds

Q-Feeds delivers curated indicators of compromise (IPs and domains) on a schedule. The OPNsense plugin is purpose-built to consume the IP feeds, and the official documentation assumes you’ll feed the domain side into Unbound. If you’re running AdGuard Home as your primary DNS resolver instead of Unbound — as I am — that integration path doesn’t apply directly, and you have to wire the domain feeds in manually. A two-layer threat intelligence setup is only as good as the DNS path that feeds it. This post walks through wiring Q-Feeds into OPNsense (IP layer) and AdGuard Home (DNS layer), and then — the part that turned out to matter most — actually forcing every device on the network to use that DNS path, instead of just offering it. ...

May 7, 2026 ·  25 min

Claude Code Self Evolving

Most Claude Code setups are static. You write a CLAUDE.md, list your conventions, and hope Claude follows them. When it doesn’t, you correct it. Next session, it forgets. You correct it again. This guide builds something different: a system where every correction you make gets captured and logged, repeated corrections automatically become permanent rules, discovered patterns get verified before they’re trusted, and a periodic audit command decides what stays, what gets promoted, and what gets pruned. ...

April 1, 2026 ·  33 min

Mosh FIDO2 / Yubikey Fix

Problem When using mosh with a FIDO2-backed SSH key (sk-ed25519 / sk-ecdsa, e.g. YubiKey), the touch prompt is never shown. The YubiKey blinks — meaning it received the signing request — but the terminal hangs silently until timeout. This affects any tool that invokes SSH as a subprocess without a proper controlling TTY, including mosh and ansible. Root Cause Mosh calls SSH internally with the -n flag: ssh -n -tt -S none -o ProxyCommand=... <host> -- mosh-server new ... The -n flag redirects SSH’s stdin from /dev/null. libfido2 needs a real /dev/tty to print the touch prompt. With -n in effect, the signing request reaches the YubiKey hardware (hence the blinking) but the prompt is swallowed and there is no way to respond. ...

March 6, 2026 ·  7 min