Flaviu Vlaicu 👾

Hermes Agent Operator's Manual

The Operator’s Manual for Hermes Agent Building an AI assistant that can act, remember, and improve Operator’s Manual · Edition 3.2 · Verified against official Nous Research documentation About This Manual This manual explains how to deploy and operate Hermes Agent as a persistent “operator” — an AI system that runs continuously, uses tools, remembers context across sessions, and improves over time — rather than as a single-session chatbot. It covers architecture, installation, the core mental model, day-to-day workflows, the operator loop, common failure modes, advanced configuration (including offline skill optimization with GEPA), and a distilled set of operational lessons. ...

DGX Spark + vLLM Playbook

A practical, end-to-end guide for serving LLMs with vLLM on a DGX Spark (GB10 Grace Blackwell), assembled from NVIDIA’s official playbook, the vLLM team’s deep-dive, model cards, and battle-tested community setups. ⚠️Warning Flag choices on Spark are model- and image-specific, not hardware-wide defaults. The recipes below are starting points that worked for their authors against a specific container tag. Validate against the exact image you run, and pin a known-good tag/digest for anything you depend on. Copying a flag from one model’s recipe to another can silently regress throughput or output quality. ...

The Modern Ubuntu Bash Terminal Setup

This is a long-form, opinionated guide to setting up a terminal that’s both pretty (syntax-highlighted, themed, autosuggesting) and productive (fuzzy everything, smart history, per-project Python envs, modern replacements for the classic Unix tools). It’s everything I wish I’d known before assembling the stack — including the half-dozen subtle ordering and key-binding issues that ate a couple of evenings of my life. Target: Ubuntu 22.04 or newer, with bash as your shell. No zsh, no fish — bash all the way. The reason: it’s the default, it’s everywhere, and with ble.sh it gets ~95% of zsh’s quality-of-life features. ...

LLM Quantization

Quantization is the single most important technique for running large language models outside a datacenter. It is what turns a model that needs eight enterprise GPUs into one that runs on a gaming card, a laptop, or a Mac mini. But the moment you go to download a model, you are confronted with an intimidating wall of cryptic names — Q4_K_M, IQ3_XXS, UD-Q5_K_XL, GPTQ-Int4, AWQ, NF4, EXL3, NVFP4 — with little explanation of what they mean or which one you should pick. ...

DGX Spark + LlamaCPP Playbook

Complete Setup & Operations Guide Everything needed to build, run, update, and operate local LLMs on an NVIDIA DGX Spark (GB10 / sm_121) with llama.cpp and the llm helper command. 1. How the pieces fit The Spark (GB10). Blackwell GPU at compute capability 12.1 (sm_121), 128 GB unified LPDDR5x shared between CPU and GPU, ~273 GB/s memory bandwidth. Bandwidth is the bottleneck for token generation, so Mixture-of-Experts (MoE) models with few active parameters run far faster than dense models of the same total size. Prefer MoE. ...

Minisforum A2

I bought a Minisforum MS-A2, lived with it for months, modified most of it, pushed it harder than most people will, and then sold it. This review is the long answer to why, and it isn’t a clean recommendation either way. The MS-A2 is one of the most impressive small machines you can buy. It’s also one I’d never put on my desk or in my living room. I’ll explain how both of those are true. ...

Q-feeds

Q-Feeds delivers curated indicators of compromise (IPs and domains) on a schedule. The OPNsense plugin is purpose-built to consume the IP feeds, and the official documentation assumes you’ll feed the domain side into Unbound. If you’re running AdGuard Home as your primary DNS resolver instead of Unbound — as I am — that integration path doesn’t apply directly, and you have to wire the domain feeds in manually. A two-layer threat intelligence setup is only as good as the DNS path that feeds it. This post walks through wiring Q-Feeds into OPNsense (IP layer) and AdGuard Home (DNS layer), and then — the part that turned out to matter most — actually forcing every device on the network to use that DNS path, instead of just offering it. ...

Claude Code Self Evolving

Most Claude Code setups are static. You write a CLAUDE.md, list your conventions, and hope Claude follows them. When it doesn’t, you correct it. Next session, it forgets. You correct it again. This guide builds something different: a system where every correction you make gets captured and logged, repeated corrections automatically become permanent rules, discovered patterns get verified before they’re trusted, and a periodic audit command decides what stays, what gets promoted, and what gets pruned. ...