# Hermes Agent Operator's Manual

*May 24, 2026*
 — by Flaviu Vlaicu

> Building an AI assistant that can act, remember, and improve



# The Operator's Manual for Hermes Agent

**Building an AI assistant that can act, remember, and improve**

*Operator's Manual · Edition 3.2 · Verified against official Nous Research documentation*

---

## About This Manual

This manual explains how to deploy and operate Hermes Agent as a persistent
"operator" — an AI system that runs continuously, uses tools, remembers context
across sessions, and improves over time — rather than as a single-session
chatbot. It covers architecture, installation, the core mental model, day-to-day
workflows, the operator loop, common failure modes, advanced configuration
(including offline skill optimization with GEPA), and a distilled set of
operational lessons.

Hermes Agent moves quickly. It was first released publicly on **25 February
2026** and has shipped frequent updates since. Exact command flags, default
values, tool counts, and bundled-skill counts change between releases. This
edition was checked against the official documentation at
`hermes-agent.nousresearch.com/docs`, the GitHub repositories
(`github.com/NousResearch/hermes-agent` and
`github.com/NousResearch/hermes-agent-self-evolution`), and the project site.
When a specific number or flag matters, verify it against the current docs or
with `--help`.

### How to use this manual

The manual is organized so that any topic can be found and re-read quickly:

- **Chapters 1–3** are conceptual — architecture and the mental model. Read
  these once; they make every later feature predictable.
- **Chapters 4–5** are operational — workflows and how to run Hermes
  continuously.
- **Chapter 6** is a failure-mode index — scan it when something breaks.
- **Chapters 7–8** are advanced configuration and distilled lessons.
- **The Appendix** is a pure command reference.

Some material deliberately appears in more than one place: a concept is
*explained* once (in Chapters 1–3), then *referenced* where it is applied
(Chapters 4–8) and *listed* for quick lookup (Chapter 6 and the Appendix). That
is reference-manual design, not accidental duplication.

### Corrections incorporated in this edition

Earlier informal guides to Hermes Agent contained inaccuracies. They are
corrected here and listed so the differences are explicit:

- **There is no "Hermes Vault" feature.** Credential management is
  `hermes auth` (credential pools with same-provider key rotation) plus the
  official `1password` skill. See Chapter 7.
- **There is no `/skill` command.** An installed skill is loaded by typing
  `/<skill-name>` directly. `/skills` is a separate command for searching,
  installing, and managing skills.
- **There are six terminal backends:** `local`, `docker`, `ssh`, `daytona`,
  `singularity`, and `modal`. Daytona and Modal are the serverless options.
- **Built-in memory is file-based** (`MEMORY.md` and `USER.md`). It is one of
  three memory tiers; see Chapter 3.
- **Skills are auto-generated, not only hand-written.** Hermes creates skills
  from experience, refines them with the `skill_manage` tool, maintains them
  with a background Curator, and can optimize them offline with GEPA.
- **`hermes tools enable NAME` is not a documented subcommand.** Toolsets are
  managed through `hermes tools`, `hermes setup tools`, or the `-t/--toolsets`
  flag.

### Conventions

This manual addresses the reader directly ("you", "your"). `~/.hermes/` is the
Hermes home directory; `/home/user/...` stands in for an absolute path that
should be replaced with a real one. Commands in code blocks are run from a
shell unless marked as in-session slash commands.

---

## Introduction: The Operator Model

The common way to use a large language model is as a chat box: a prompt is
pasted in, output is copied out. This works for lookups and drafts, but it has a
structural ceiling. The model holds no context between conversations, the human
carries all continuity, and every session starts cold. The limit is not prompt
quality — it is the interaction model.

There is a second model. Instead of asking a model questions, you delegate work
to an *operator*: a system that runs continuously, remembers what it has
learned, uses tools to act, schedules its own follow-up work, and verifies its
own results. The difference is the difference between looking up driving
directions and employing an assistant who already knows the route, notices when
conditions change, and adjusts the plan unprompted. One is a lookup. One is a
delegation.

Hermes Agent is built for the second model. Its one-line pitch is *an agent
that gets better the longer you use it.* What makes that real is that three
capabilities usually found in separate tools sit in one framework: runtime
skill learning, persistent multi-layer memory, and an optional offline
optimization pipeline. The shift that matters is conceptual — once a task is set
up, it runs, and the human's role moves from *executing* to *operating*:
supervising, verifying, and intervening only where a human decision is genuinely
required.

---

## Chapter 1: What Hermes Agent Is and How It Is Built

### The pitch

**Hermes Agent is an open-source, self-improving AI agent framework built by
Nous Research,** released in February 2026 under the MIT license. It runs on
Linux, macOS, WSL2, native Windows (early beta), and Android via Termux. It
connects to almost any LLM provider, exposes dozens of built-in tools, and — the
property that distinguishes it — it learns: it creates reusable skills from
experience, refines them as it uses them, remembers facts across sessions, and
can search its own conversation history.

Hermes is not tied to a single machine. It can run on a low-cost VPS, a GPU
server, or serverless infrastructure (Daytona, Modal) that hibernates when idle
and costs almost nothing between tasks. It can be operated from a terminal or
from any of 20+ messaging platforms.

### Architecture

Understanding the structure makes every later feature predictable.

Everything flows through a single core agent class (an `AIAgent` in a
`run_agent.py` script). The CLI, the messaging gateway, IDE integration, the
batch runner, and an API server are all *entry points* into that same core —
which is what makes the platform-agnostic story work in practice.

```
   Entry points                  Core agent                 Backends
 ┌──────────────┐         ┌─────────────────────┐      ┌──────────────────┐
 │ CLI          │──┐      │  AIAgent            │──┬──> │ Session storage  │
 │ Gateway      │──┤      │  ┌───────────────┐  │  │    │ (SQLite + FTS5)  │
 │ IDE (ACP)    │──┼────> │  │ Prompt builder│  │  │    └──────────────────┘
 │ Batch runner │──┤      │  │ Provider res. │  │  │    ┌──────────────────┐
 │ API server   │──┘      │  │ Tool dispatch │  │  └──> │ Tool backends:   │
 └──────────────┘         │  │ Compression   │  │       │ terminal, web,   │
                          │  └───────────────┘  │       │ browser, file,   │
                          └─────────────────────┘       │ MCP, vision, TTS │
                                                         └──────────────────┘
```

The core loop is ReAct-style and synchronous: build the system prompt, check
whether context compression is needed, make an interruptible API call, execute
any tool calls the model requested, and loop. Four details matter operationally:

- **Six execution backends.** The agent can run commands locally, in Docker, on
  a remote host over SSH, or in a Modal, Daytona, or Singularity sandbox — the
  same code, changed with one config setting. Execution can be moved from a
  laptop to a cloud server without touching anything else.
- **Provider translation.** A translation layer routes any provider through one
  of a small number of API formats, which is why the active model can be
  swapped — Claude, GPT, Gemini, a local Ollama model — with one command and
  nothing else breaking.
- **A per-task turn cap (90 by default).** Each task has a hard ceiling on the
  number of reasoning/tool turns. Without it, an agent stuck in a loop —
  retrying a failing API, re-reading the same file — would silently consume
  credits. Sub-agents spawned by delegation share the same budget, so a runaway
  delegation chain cannot bypass the cap. The ceiling is configurable in the
  setup wizard.
- **Context compression.** When a session approaches the model's context-window
  limit, the loop compresses history automatically. Compression can also be
  triggered manually; see Chapter 5.

### Where Hermes fits: the comparison with OpenClaw

Hermes is not primarily a coding copilot tied to an editor. Its closest peer in
the open ecosystem is the personal-agent project **OpenClaw**. Both are
persistent and messaging-friendly, but they make opposite architectural choices.
A frequently quoted framing: *Hermes packages a gateway around a learning
agent; OpenClaw packages an agent around a messaging gateway.*

| Dimension | OpenClaw | Hermes Agent |
| --- | --- | --- |
| Architecture | Gateway-first; the agent is attached to the messaging layer | Agent-first; the gateway is one entry point into a learning runtime |
| Channel breadth | Very broad (50+ messaging channels) | Focused (20+ channels, the most-used ones) |
| Skill ecosystem | Very large community skill pool | ~120 skills bundled, plus the Skills Hub and GitHub taps |
| Learning loop | Skills are static | Skills self-evolve; the Curator prunes; GEPA optimizes offline |
| Memory | Plain markdown files | Three tiers: bounded markdown, FTS5 search, pluggable external providers |
| Security posture | Gateway-first design and a large unvetted skill pool have been associated with publicized incidents in 2026 | Snapshot-before-write for file operations and a curated skill set reduce some surface |

Treat the security row as point-in-time and directional, not as a current
audit — verify present advisories for both projects before relying on either in
a sensitive context (see Chapter 8). Setups can be migrated directly from
OpenClaw; see Chapter 7.

---

## Chapter 2: Installation and First Run

A working installation, configured and running a real task, takes roughly 30
minutes including troubleshooting. Requirements: Linux, macOS, or WSL2 (native
Windows and Android/Termux are also supported); Python 3.11+, which the
installer provides; and around 8 GB of RAM for ordinary API-based use.

### Install

**Linux, macOS, or WSL2:**

```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or ~/.zshrc
```

**Native Windows (PowerShell, early beta):**

```powershell
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)
```

**Android (Termux):** use the same `curl` one-liner as Linux; the installer
detects Termux automatically.

For an existing installation, `hermes update` pulls the latest version.
Automatic pre-update backups are **off by default** — add `--backup` for a
snapshot, or set `update.backup: true` in `config.yaml`. `hermes update --check`
reports whether a newer version exists without installing it.

### Setup wizard

```bash
hermes setup
```

The first run launches an interactive wizard with five sections:

1. **Model** — choose an LLM provider and enter an API key, or run an OAuth flow.
2. **Terminal** — choose an execution backend (`local`, `docker`, `ssh`,
   `daytona`, `singularity`, `modal`).
3. **Gateway** — optionally configure messaging platforms.
4. **Tools** — enable or disable toolsets.
5. **Agent** — behavior settings such as the per-task turn cap and compression
   thresholds.

Configuration is written to `~/.hermes/config.yaml`; API keys are stored
separately in `~/.hermes/.env` and never in `config.yaml`. Any single section
can be re-run later, e.g. `hermes setup model`.

`hermes model` runs the full provider-setup wizard from a shell (it can add a
provider or run OAuth). The in-session `/model` command switches only among
already-configured providers. Model identifier strings change as providers
release new models; treat any specific string as an example.

### Health check

```bash
hermes doctor          # checks dependencies, config, credentials, directories
hermes doctor --fix    # attempts automatic repairs
hermes status          # visual overview of agent, auth, and platform state
hermes dump            # plain-text summary for a support request
```

Run `hermes doctor` before any significant work.

**Two rules that prevent most early confusion.** Toolset changes take effect
only in a new session — start one with `/new`. Configuration edits are read once
at startup and cached — after editing `config.yaml` directly, exit and relaunch
(or `/restart` the gateway). Prefer `hermes config set <key> <value>` or
`hermes config edit` over editing the YAML blindly.

### First task

```bash
hermes chat -q "List the contents of ~/projects and tell me what kind of projects you see there"
```

This returns a directory listing and a short summary — a first real agent
action: it read the filesystem with its file tool, reasoned about the result,
and replied. To start an interactive session, run `hermes chat` (or just
`hermes`).

Expect early friction: first-time setup commonly includes an API key error or
provider timeout. Wait about 60 seconds and retry; if it persists, verify the
key in `~/.hermes/.env`.

### What lives in ~/.hermes/

Most day-to-day work touches one of these paths. Knowing the layout makes the
rest of the manual concrete.

```
~/.hermes/
├── config.yaml        # main configuration (non-secret)
├── .env               # API keys, bot tokens, secrets
├── auth.json          # OAuth provider credentials
├── SOUL.md            # agent identity — slot #1 in the system prompt
│
├── memories/
│   ├── MEMORY.md      # persistent agent facts (Tier 1 memory)
│   └── USER.md        # the agent's model of you (Tier 1 memory)
│
├── skills/            # all skills — bundled, hub-installed, agent-created
│   ├── <category>/<skill-name>/SKILL.md
│   ├── .archive/      # skills the Curator has archived (recoverable)
│   └── .hub/          # Skills Hub state
│
├── sessions/          # per-platform session metadata
├── state.db           # SQLite session store, FTS5-indexed (Tier 2 memory)
├── cron/
│   ├── jobs.json      # scheduled jobs
│   └── output/        # cron run outputs
│
├── profiles/<name>/   # isolated profiles, each a full Hermes home
├── plugins/           # custom plugins
├── hooks/             # lifecycle hooks
├── skins/             # CLI themes
└── logs/              # agent.log, gateway.log, errors.log
```

Most of this is never edited by hand. The files worth knowing: `config.yaml`
(source of truth for everything non-secret), `.env` (secrets — Hermes routes
secret-looking values here automatically), `SOUL.md` (identity; Chapter 3),
`skills/` (where the entire learning loop lives), and `state.db` (the
FTS5-indexed database that makes "what did we discuss three weeks ago?" work).

---

## Chapter 3: The Mental Model

Hermes becomes predictable once a small set of concepts is understood. They are
**Tools, Identity, Skills, Context files, Memory, Sessions, Cron, and Gateway**,
tied together by the **Learning Loop**. A useful framing to hold throughout:
*memory is what the agent knows; skills are how it does things; identity is who
it is.*

### Toolsets: what Hermes can do

A *tool* is a single capability (run a shell command, search the web). A
*toolset* is a named group of related tools. Hermes ships dozens of tools — the
exact count grows with releases — organized into toolsets. The capabilities most
operators rely on:

| Capability | What it does |
| --- | --- |
| web | Web search and content extraction |
| terminal | Run shell commands, manage processes |
| file | Read, write, search, and patch files |
| browser | Browser automation (local Chrome over CDP, or a cloud browser) |
| code execution | Sandboxed Python via `execute_code`, including Programmatic Tool Calling |
| vision | Analyze images |
| image generation | Generate images |
| tts / voice | Text-to-speech and real-time voice |
| delegation | Spawn isolated sub-agents for parallel work |
| cron | Schedule tasks on a timeline |
| memory | Persistent cross-session facts |
| session search | Full-text search over past conversations |
| skills | Browse, install, and load skills |
| messaging | Send messages across platforms |
| kanban | Drive the multi-agent collaboration board |
| computer use | Drive a desktop GUI (macOS, via the cua-driver backend) |

Toolsets are configured through the interactive `hermes tools` UI, the
`hermes setup tools` section, or per run with `-t/--toolsets` (e.g.
`hermes chat --toolsets web,terminal,skills`). The authoritative list is in the
Toolsets Reference in the official docs. Enable only what a task needs — a
smaller tool schema produces cleaner agent behavior.

### Identity: SOUL.md

Above memory and skills sits a layer that determines *who the agent is when it
shows up*: identity. Without it, every agent feels like the same agent wearing
different hats.

Identity is a single hand-authored file, `~/.hermes/SOUL.md`. It occupies the
first slot in the system prompt, before anything else loads, and defines the
agent's personality, tone, communication style, and hard limits.

```markdown
# SOUL.md
You are a pragmatic senior engineer with strong taste.
You optimize for truth, clarity, and usefulness over politeness theater.
```

`SOUL.md` is static — written once, tweaked occasionally — and stays consistent
across every project and session. If the file is missing, Hermes falls back to a
built-in default identity (and now seeds a starter `SOUL.md` automatically).
Identity matters to the self-improving story because everything that follows —
the memory the agent writes, the skills it creates, the way it consolidates
knowledge — happens through the lens of this file. `SOUL.md` is the fixed frame;
memory and skills are the moving parts inside it. For a temporary change of
register without editing the file, `/personality` swaps in a built-in or custom
personality preset for the current session only.

### Skills: procedural memory the agent writes itself

If memory holds facts, *skills* hold procedures — not what the agent knows but
how it does things. A skill is a markdown file with YAML frontmatter, stored
under `~/.hermes/skills/`.

```markdown
---
name: k8s-pod-debug
description: >
  Activate for crashing pods, CrashLoopBackOff,
  "why is my pod restarting", container failures.
version: 1.2.0
author: agent
platforms: [linux, macos]
---
## Procedure
1. Get pod status, check events, pull logs
2. Look for OOMKilled, ImagePullBackOff, config errors

## Pitfalls
- Forgetting the --previous flag on restarted containers

## Verification
- Pod stays Running with 0 restarts for 5+ minutes
```

**Progressive disclosure.** To keep token cost low, skills load in three levels:

- **Level 0** — the agent sees only names and descriptions (roughly 100 tokens
  per skill; about 3k tokens for the full catalog). This is all that loads at
  session start.
- **Level 1** — the full skill body, loaded on demand when a skill's triggers
  match the task.
- **Level 2** — specific reference files inside a skill, opened only when the
  agent needs that depth.

The result: the agent pays in tokens only for the skills it actually uses.

**Self-evolution.** This is the core differentiator. The agent creates its own
skills autonomously using the `skill_manage` tool. Creation triggers when the
agent completes a complex task (roughly five or more tool calls), recovers from
errors and finds the working path, is corrected by the operator, or discovers a
non-trivial workflow. The loop: encounter a problem, solve it through trial and
error, save the working approach as a `SKILL.md` file, and — next time a similar
problem appears — load that skill and follow the proven procedure instead of
rediscovering it. One-time discoveries become permanent procedural memory.

The `skill_manage` tool supports six actions: `create`, `patch` (a targeted
fix — preferred, because it is token-efficient), `edit` (a full rewrite),
`delete`, `write_file`, and `remove_file`.

Skills can also be written by hand and shared. Browse and install community
skills with `hermes skills browse` and `hermes skills install`; publish with
`hermes skills publish`; group several under one slash command with
`hermes bundles`. Any GitHub repository can be added as a custom *tap*:

```bash
hermes skills tap add yourname/your-skills-repo
hermes skills install yourname/your-skills-repo/<skill-name>
```

About 120 skills ship bundled; the Skills Hub (`skills.sh` / `agentskills.io`)
and community taps add many more. Counts grow with every release.

### The Curator: maintenance for the skill library

Without maintenance, agent-created skills accumulate into dozens of narrow,
overlapping playbooks that waste tokens. The **Curator** is a background
maintenance system that prevents this.

It runs on an *inactivity check*, not a cron daemon: when at least 7 days have
passed since its last run **and** the agent has been idle for 2 or more hours, a
background fork of the agent spins up with its own prompt cache, never touching
the active conversation. It operates in two phases:

- **Automatic transitions** (deterministic, no LLM): a skill unused for 30 days
  becomes *stale*; unused for 90 days, it is *archived*.
- **LLM review** (up to 8 iterations): the forked agent surveys all
  agent-created skills and decides, per skill, whether to keep, patch,
  consolidate, or archive.

```
 Active  ──30d unused──>  Stale  ──90d unused──>  Archived  ──> Restored
   ▲  (deterministic, no LLM)          (moved to .archive/)   (one command,
   └──────────────── re-activated on use ────────────────────  reversible)
```

Two constraints make the Curator safe: it never touches bundled or
hub-installed skills (only agent-authored ones), and it never auto-deletes — the
worst outcome is archival to `~/.hermes/skills/.archive/`, recoverable with one
command. Before every pass it takes a `tar.gz` snapshot of the entire skills
directory, and rollbacks are themselves reversible. Critical skills can be
protected with `hermes curator pin <skill>`; patches and edits still apply to a
pinned skill, so the agent can improve it without it being unpinned first.
`hermes curator run --dry-run` previews a pass at any time.

### Context files: AGENTS.md and friends

Distinct from skills (loaded on demand) are *context files* (loaded
automatically, every session). `SOUL.md`, covered above, is one. Hermes also
discovers and loads, from the current working directory:

- **`AGENTS.md`** — a project's standing rules. Placed in a project root, it
  holds architecture, conventions, and instructions ("FastAPI backend with
  SQLAlchemy", "always use async for database operations", "never commit
  `.env`"). Hermes loads the top-level `AGENTS.md` at session start;
  subdirectory `AGENTS.md` files are discovered lazily during tool calls.
- **`.hermes.md`** and **`CLAUDE.md`** — also recognized as project context
  files, so an existing `CLAUDE.md` written for Claude Code is picked up
  without conversion.
- **`.cursorrules`** — a `.cursorrules` or `.cursor/rules/*.mdc` file is read
  automatically, so existing coding conventions need not be duplicated.

The division: **`SOUL.md` defines who Hermes is; `AGENTS.md` (and the others)
define a project's rules; memory holds facts about you; skills hold
procedures.** Keep context files concise — every character is injected into
every message and counts against the token budget. `--ignore-rules` skips all
context files, memory, and preloaded skills for a clean-room run.

### Context references: pulling content in with @

Separate from the always-loaded context files, *context references* inject
content into a single message on demand. Typing `@` followed by a reference —
a file path, a folder, a git diff, or a URL — expands it inline: Hermes appends
the referenced content to the message automatically. This is the precise tool
for "look at *this*" without restructuring a project's `AGENTS.md` or pasting a
file by hand. Use context files for what should always be loaded; use `@`
references for what matters only to the message in front of you.

### Memory: three tiers, three speeds

Hermes does not have a single memory. It has three layers, each for a different
purpose. The agent picks the right one for the question.

**Tier 1 — in-prompt memory.** Two small markdown files in `~/.hermes/memories/`:
`MEMORY.md` (the agent's notes on your environment, conventions, tool quirks,
and lessons learned — about 2,200 characters) and `USER.md` (your profile: name,
communication preferences, skill level, things to avoid — about 1,375
characters). Both are injected into the system prompt as a *frozen snapshot* at
session start: a memory written mid-session persists to disk immediately but
does not appear in the prompt until the next session. When a file approaches
capacity (about 80%, shown as a percentage in the system-prompt header), the
agent *consolidates* — merging related entries into denser versions so only
useful information survives. You can assist: "clean up your memory", "replace
the old Python 3.9 note — we are on 3.12 now". Because Tier 1 is plain markdown,
it can be audited directly by opening the files. Speed: instant. Capacity: tiny.

**Tier 2 — session search.** Every conversation, from the CLI and from messaging
platforms, is stored in `state.db` (SQLite, FTS5-indexed). The agent can search
weeks of past conversations with the `session_search` tool, summarizing matches
with an LLM. Speed: on demand. Capacity: effectively unlimited. This is the same
store that powers session resume (see Sessions, below).

**Tier 3 — external memory providers.** For a deeper, persistent model of you,
Hermes ships pluggable providers configured with `hermes memory setup`. Only one
external provider is active at a time, and it runs *alongside* Tier 1 rather
than replacing it. Available providers, each with different storage, cost, and
dependency trade-offs, include Honcho (dialectic user modeling), OpenViking
(self-hosted, filesystem hierarchy), Mem0 (server-side LLM extraction),
Hindsight (knowledge graph), Holographic (local, no dependencies), RetainDB
(delta compression), ByteRover, and SuperMemory (context fencing). When an
external provider is active, Hermes prefetches relevant memories before each
turn, syncs conversation turns after each response, and extracts memories at
session end. `hermes memory status` shows current state.

The trade-off across tiers is the design: critical facts live in Tier 1, always
in context but bounded; everything else is searchable on demand in Tier 2;
Tier 3 adds a deeper model at the cost of an external dependency. Use Tier 1 for
durable facts ("always run X from directory Y"), not for session-specific
context ("working on X today"), which belongs in the session itself.

### Sessions: the conversation thread

Each chat is a *session* — resumable, nameable, and searchable.

```bash
hermes --resume <id-or-title>   # resume a specific session
hermes --continue               # resume the most recent session
hermes --continue <name>        # resume the most recent matching a title
```

Sessions are managed with `hermes sessions` (`list`, `browse`, `rename`,
`prune`, `export`, `delete`). Naming sessions with `/title` keeps them findable;
unnamed sessions become an indistinguishable pile within a week. Sessions
provide continuity and an audit trail, and — as Tier 2 above — are searchable.

### Cron: scheduled execution

Cron runs tasks on a schedule. The gateway daemon ticks every 60 seconds, runs
any due jobs in isolated sessions, and delivers output to a messaging platform.
Jobs survive restarts; they live in `~/.hermes/cron/jobs.json` and output goes
to `~/.hermes/cron/output/`.

You do not have to write cron expressions — a job can be described in plain
English and Hermes converts it. The in-session `/cron` command also accepts
explicit forms:

```text
/cron add 30m "Remind me to check the build"          # one-shot, runs once in 30 minutes
/cron add "every 2h" "Check server status"            # recurring interval
/cron add "0 9 * * 1-5" "..."                          # standard cron expression — weekdays 09:00
/cron add "every 1h" "Summarize new items" --skill blogwatcher   # load a skill before running
```

From a shell, jobs are managed with `hermes cron list / create / edit / pause /
resume / run / remove / status`. Jobs can be chained: one job's output becomes
the next job's input via a `context_from` flag — useful for multi-stage
automations (a research step feeding a writing step).

Delivery needs a destination. Run `/sethome` in a Telegram or Discord chat to
mark it as the home channel for proactive output; without one, the agent has
nowhere to send scheduled results. If a job reports success but no message
arrives — a failure mode observed with jobs producing structured output — verify
the home channel first, then consult the cron-troubleshooting guide; the
reliable fallback is to have the job's script post to the platform API directly
(see Chapter 4). A job that should not fire until reviewed can be created and
immediately paused.

### Gateway: Hermes in messaging platforms

The Gateway runs Hermes as a service inside messaging platforms — the CLI,
Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS, Microsoft Teams,
and others, 20+ in all.

```bash
hermes gateway setup     # interactive platform configuration
hermes gateway run       # run in the foreground (recommended for WSL, Docker, Termux)
hermes gateway install   # install as a background service (standard Linux, macOS)
hermes gateway start / stop / restart / status / list
```

Per-platform setup differs: Telegram needs a bot token from `@BotFather` (and
your numeric user ID, available from `@userinfobot`, for the allowlist); Discord
needs a bot token with the Message Content Intent enabled; Slack uses an app
manifest (`hermes slack manifest`). Gateway activity is logged to
`~/.hermes/logs/gateway.log`.

**WSL note.** WSL's systemd support is unreliable, so on WSL do not start the
gateway as a background service. Run it in the foreground inside `tmux` so it
survives the terminal closing: `tmux new -s hermes 'hermes gateway run'`.

### The Learning Loop: how it fits together

The concepts above feed one another. The agent acts, captures what worked as
skills, persists durable facts to memory, and the Curator keeps the skill set
clean — so the agent is measurably more capable after months of use than on day
one:

```
   act ──> notice a reusable pattern ──> skill_manage creates/patches a skill
    ▲                                                │
    │                                                ▼
 remember facts <── self-prompt to persist <── Curator prunes & consolidates
                                                     │
                                            (GEPA optimizes offline — Ch. 7)
```

In one sentence: **`SOUL.md` sets the identity, the runtime loop captures
experience, the Curator keeps the library clean, and GEPA makes sure what is in
the library actually works.**

### Summary of concepts

- **Tools** — what Hermes can do (capabilities)
- **Identity** — who Hermes is (`SOUL.md`, slot #1 in the system prompt)
- **Skills** — how Hermes does things (procedural memory, partly self-authored)
- **Context files** — a project's standing rules (`AGENTS.md`, `.cursorrules`)
- **Memory** — what Hermes knows (three tiers: in-prompt, session search, external)
- **Sessions** — what was being worked on (history and continuity)
- **Cron** — what runs automatically (scheduling)
- **Gateway** — where Hermes can be reached (access)
- **Learning Loop** — how all of the above compound over time

Any new Hermes feature can be placed against this list, which makes the system
predictable rather than a set of features to memorize.

---

## Chapter 4: Core Workflows

Four representative workflows, with example prompts and expected behavior.

### Workflow A: Research pipeline

**Goal:** turn an open question into a structured reference file.

**Example prompt:**

```text
Research the current state of GGUF quantization for local LLM inference.
Find: (1) what tools support GGUF (2) performance benchmarks vs full precision
(3) any recent updates. Write a structured summary to
~/research/gguf-state-$(date +%Y%m%d).md with sources.
```

Hermes activates web tools, searches, reads full pages rather than snippets,
synthesizes across them, writes a structured markdown file, and lists sources.
Because the output is a file, the task can be started and left to complete.

### Workflow B: Repository debugging

**Goal:** hand a broken repository to Hermes and receive a fix or a clear
explanation of what must change.

**Example prompt:**

```text
There's a test failure in ~/projects/kiln. Run pytest and tell me what's
failing and why. If it's a simple fix, apply it and rerun to confirm.
Report what the problem was and what you changed.
```

Hermes changes into the directory, runs the tests, reads the failing test and
relevant source, identifies the root cause, applies a fix, and re-runs to
confirm. For obvious defects a one-shot query suffices; for complex failures, an
interactive session lets you walk the stack trace with the agent. For risky
edits, add `--checkpoints` so files can be restored with `/rollback` (Chapter 5).

### Workflow C: Scheduled daily briefing

**Goal:** have Hermes run a recurring task unattended and deliver the result.

Create a cron job with a schedule and a prompt (see Chapter 3 for the `/cron
add` forms). For reliable delivery, the job can run a script that posts directly
to the platform API. The example below uses Telegram:

```bash
#!/bin/bash
# Daily content radar — posts a briefing to Telegram
TELEGRAM_BOT_TOKEN=$(grep TELEGRAM_BOT_TOKEN ~/.hermes/.env | cut -d= -f2)
TELEGRAM_CHAT_ID=$(grep TELEGRAM_CHAT_ID ~/.hermes/.env | cut -d= -f2)

BRIEFING=$(hermes -z "Run a content radar: find 3-5 interesting posts about AI
agents, local model setups, or Hermes workflows from the past 24 hours.
Format as a numbered briefing with links.")

curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
  -d "chat_id=${TELEGRAM_CHAT_ID}" \
  -d "text=${BRIEFING}" \
  -d "parse_mode=Markdown"
```

Two details: `hermes -z` is the scripted one-shot entry point — a single prompt
in, the final answer out, nothing else on stdout — which suits cron, CI, and
parent scripts. And the script posts to the API directly: a direct call fails
loudly (a non-zero exit code) if it fails, whereas built-in delivery has been
observed to report success while the message did not arrive. Set up once, this
runs unattended on its schedule.

### Workflow D: Multi-agent Kanban

**Goal:** split a large task across specialist agents, run them in parallel, and
synthesize.

Hermes's Kanban is a multi-profile collaboration board. A parent task holds the
synthesis; child tasks hold parallel workstreams, each assigned to a specialist
profile; children are linked to the parent so the parent runs only after they
complete.

```bash
# Parent synthesis task
hermes kanban create "Write comprehensive X review" --assignee writer

# Child tasks — note --workspace dir:<absolute path> on every one
hermes kanban create "Research X market landscape" \
  --assignee researcher --parent <parent-id> \
  --workspace dir:/home/user/research/x-market

hermes kanban create "Technical analysis of X architecture" \
  --assignee engineer --parent <parent-id> \
  --workspace dir:/home/user/code/x-analysis

# Link children to the parent, then run one dispatcher pass
hermes kanban link <parent-id> <child-a-id>
hermes kanban link <parent-id> <child-b-id>
hermes kanban dispatch
```

**Workspaces are critical.** Every task that produces files to keep must use
`--workspace dir:/absolute/path`. The default `scratch` workspace is
garbage-collected when a task is archived; a `dir:` workspace persists.

Hermes can also decompose a task automatically: a coarse task placed in the
triage column is fanned out by `hermes kanban decompose` (on by default per
dispatcher tick) into child tasks routed to specialist profiles. Keep individual
tasks small enough to complete in a single agent run — a task needing hundreds
of turns will hit the turn cap, crash, and be re-queued.

---

## Chapter 5: The Operator Loop

Running Hermes continuously requires an operational model. The principle: an
operator sets up loops, verifies outputs, and intervenes only when a human
decision is required — not one who watches the agent work.

Every operator session moves through four states:

```
Active ──> Waiting ──> Recovering ──> Done ──> (back to Active)
```

- **Active** — Hermes is working: running commands, searching, reading or
  writing files, generating output.
- **Waiting** — Hermes hit a blocking call (an LLM request, a tool waiting on an
  external resource, a rate limit). It resumes on its own.
- **Recovering** — something failed. Hermes retries, summarizes context, or
  restarts a step — or flags the operator and waits.
- **Done** — the task is complete, output delivered, the session saved.

A one-shot query passes through Active → Waiting → Done in seconds. A complex
project cycles through these states over hours, bounded by the per-task turn cap
(90 by default; Chapter 1) — a runaway loop terminates rather than silently
consuming credits.

### Long-running sessions

For tasks lasting more than a few minutes, use an interactive session, which
keeps context alive across turns. For tasks lasting hours, run them in the
background: `/background <prompt>` (aliases `/bg`, `/btw`) runs the prompt in a
separate session and reports results when finished.

### Context management

Every LLM has a finite context window. As a conversation grows, the model
degrades — it repeats itself, loses earlier detail, stops noticing the obvious —
without announcing it.

- **Under roughly two hours of active conversation:** built-in compression
  triggers automatically near the limit. It is conservative, so do not rely on
  it immediately before a critical decision.
- **Over roughly two hours:** run `/compress` deliberately, at a natural
  stopping point. Compression summarizes the history and replaces it; the
  original detail is lost, so compress at a milestone, not at a crisis.
  `/compress <focus topic>` narrows what is preserved. `/usage` shows where the
  session stands.
- **Multi-day projects:** do not keep one session alive for days. End each
  session by writing a checkpoint file (`/background Write current progress and
  next steps to ~/checkpoint.md`) and read it back next session. A
  human-readable checkpoint survives any crash; a compressed summary of hundreds
  of exchanges is still lossy.

### Filesystem checkpoints and rollback

For file-editing work, starting a session with `--checkpoints` causes Hermes to
snapshot files before destructive changes. In-session, `/rollback` lists and
restores those snapshots; `hermes checkpoints` manages the store. This is an
undo mechanism for autonomous edits, independent of git.

### Verifying output without supervising continuously

```bash
hermes chat -q "Check ~/logs/test-results.md and tell me if all tests passed"
hermes chat -q "What's the current status of the content pipeline? Any failures?"
hermes --continue   # resume a session left mid-task
```

### When to intervene

**Intervene when** Hermes asks a question it genuinely cannot resolve;
output is visibly wrong and the agent is not self-correcting; the decision is
creative or strategic; or a task has been stuck in Recovering for more than
about five minutes.

**Do not intervene when** the agent is processing (let the turn finish), is
Waiting (an API call or rate limit is in flight), or is recovering and making
progress (allow one full attempt). Reading output as it streams, rather than
waiting for completion, is supervision in name only.

---

## Chapter 6: Common Failure Modes

A failure-mode index. Each entry gives cause, fix, and prevention.

**1. Context overflow.** *Symptom:* the agent drifts, repeats itself, loses
earlier context. *Cause:* the context window is near its limit. *Fix:*
`/compress`, or `/new` and reload context from a file or skill. *Prevention:*
avoid multi-hour sessions; use file-based checkpoints.

**2. Toolset mismatch.** *Symptom:* a toolset was enabled but the agent says it
cannot use the capability. *Cause:* toolsets load at session start. *Fix and
prevention:* run `/new` after any toolset change.

**3. Configuration drift.** *Symptom:* a `config.yaml` edit is ignored, or
Hermes crashes on startup. *Cause:* configuration is read once and cached.
*Fix:* exit and relaunch (or `/restart` the gateway). *Prevention:* use
`hermes config set` or `hermes config edit`.

**4. Cron delivery surprises.** *Symptom:* a cron job reports success but no
message arrives. *Cause:* built-in delivery can fail quietly for jobs emitting
structured output, and has no destination if no home channel is set. *Fix:* set
a home channel with `/sethome`; for script-backed jobs, post to the platform API
directly. *Prevention:* use a home channel and direct API delivery.

**5. Scratch workspace data loss.** *Symptom:* a Kanban task completed but its
output files are gone. *Cause:* the default `scratch` workspace is
garbage-collected when a task is archived. *Fix:* none — recreate the work.
*Prevention:* always create output-producing tasks with
`--workspace dir:/absolute/path`.

**6. Delegation or provider mismatch.** *Symptom:* sub-agents fail immediately
with "model not supported". *Cause:* the delegation model in `config.yaml` does
not match what the current provider can serve. *Fix:* align them, then restart.
*Prevention:* after changing the main provider, check the delegation config.

**7. WSL gateway stops on terminal close.** *Cause:* WSL's systemd support is
unreliable. *Fix and prevention:* run the gateway in the foreground inside
`tmux` (`tmux new -s hermes 'hermes gateway run'`).

**8. Profile name mismatch.** *Symptom:* a Kanban task assigned to a profile is
never picked up. *Cause:* `hermes kanban assign` can fail to apply if the
profile name does not exactly match. *Fix:* verify with `hermes kanban show
<task-id>`. *Prevention:* copy the exact name from `hermes profile list`.

**9. Shared bot token across profiles.** *Symptom:* messaging breaks when
multiple profiles are connected. *Cause:* a messaging platform allows only one
connection per bot token. *Fix and prevention:* give every profile its own bot
and token (Chapter 7).

**10. Credentials stored in plaintext.** Acceptable for a hobby setup, a
liability for production. *Fix:* use `hermes auth` credential pools and the
`1password` skill rather than scattering keys across `.env` files.

**11. The agent overwrites a hand-tuned skill with a worse version.** *Cause:*
the same self-evolution mechanism that improves skills can degrade a manually
customized one. *Fix and prevention:* pin important hand-authored skills with
`hermes curator pin`, review what the Curator changes, and use GEPA (Chapter 7)
when you want trace-driven, test-gated improvement rather than the agent's own
judgment. See also Chapter 8.

---

## Chapter 7: Advanced Configuration

The capabilities below become relevant once the basics are running.

### Multi-agent orchestration

Kanban (Chapter 4) is the orchestration layer: decompose a goal into specialist
roles, run them in parallel, synthesize. The two constraints that matter — a
dedicated `--workspace dir:` per task, and tasks small enough to finish in one
run — are covered in Workflow D and are not repeated here.

### Running multiple agents: profiles

Profiles allow multiple fully independent Hermes instances, each with its own
config, memory, skills, sessions, and `SOUL.md`, sharing nothing by default.
Each profile lives at `~/.hermes/profiles/<name>/`.

```bash
hermes profile create designer --clone     # --clone copies the default profile's config and .env
hermes profile create programmer --clone
hermes profile create researcher --clone
hermes profile use <name>                  # set the sticky default
hermes -p <name> chat -q "..."             # one-off override
hermes profile list / show / rename / export / import
```

To run several agents on messaging platforms at once, give each profile its own
bot — a platform allows only one connection per token, so a shared token breaks.
Create one bot per profile and run the gateway wizard once per profile:

```bash
hermes -p designer gateway setup
hermes -p programmer gateway setup
hermes -p researcher gateway setup
```

The agents become genuinely distinct through their `SOUL.md` files — a designer
profile written for hand-drawn technical illustration, a programmer profile
written as a terse staff engineer, a researcher profile written to produce a
daily digest. Edit each at `~/.hermes/profiles/<name>/SOUL.md`.

### Delegating execution to Claude Code

A programmer profile is more powerful if it does not write code directly but
*delegates execution* to the Claude Code CLI: Hermes orchestrates and decides
what is next, while Claude Code does the file edits, runs commands, and manages
git. This also lets execution run on a Claude subscription rather than a
separate API key.

Ensure the `claude` binary is on `PATH` (`which claude` should print a real
path), then start a session with the programmer profile and send a single
activation prompt instructing it to act as a staff engineer that uses Claude
Code for all execution and to set itself up accordingly. The profile installs
the `claude-code` skill on its own, verifies the binary, and from then on routes
anything coding-related through Claude Code — choosing between Claude Code's
one-shot print mode and its interactive mode based on the task. The same
delegation pattern works for other external CLIs.

### Teaching a profile a style by example

The self-evolution loop can be used as a *setup* mechanism. Rather than
hand-writing a skill, feed a profile reference examples — illustrations,
newsletter intros, code-review comments — and ask it to study them and create a
skill (via `skill_manage`) that reproduces the pattern, including any script the
skill needs. The agent encodes the pattern itself and verifies the result. From
then on, requests in that domain trigger the skill. This works for anything
where consistency matters.

### GEPA: optimizing skills offline

The in-agent learning loop (skill creation plus the Curator) has a known
weakness: the agent tends toward self-congratulation — it usually believes it
performed well, even when it did not — and the same mechanism that auto-generates
skills can overwrite manual customizations with worse versions. The agent is, in
effect, grading its own work.

**GEPA** addresses this. GEPA (Genetic-Pareto Prompt Evolution) is *not* part of
the Hermes runtime. It lives in a companion repository,
`NousResearch/hermes-agent-self-evolution`, is MIT-licensed, and is published as
an ICLR 2026 Oral paper. It is an *offline* optimization pipeline: instead of
asking the agent "did you do well?", GEPA reads execution traces to understand
*why* things failed, then proposes targeted improvements through reflective
evolutionary search. It uses DSPy + GEPA, needs no GPU — everything runs through
API calls — and costs roughly $2–10 per optimization run.

The pipeline:

1. Read the current skill, prompt, or tool description from the Hermes repo.
2. Generate an evaluation dataset — synthetic test cases, real session history
   from `state.db`, or a hand-curated golden set.
3. Run the GEPA optimizer: read execution traces, diagnose failure points,
   generate candidate variants.
4. Evaluate candidates with LLM-as-judge scoring against rubrics (graded, not
   binary pass/fail).
5. Apply constraint gates: the full test suite must pass, skills must stay under
   a size limit, prompt-caching compatibility is preserved, and semantic purpose
   must not drift.
6. The best valid variant goes out as a pull request against the Hermes repo —
   never a direct commit — for human review and merge.

```bash
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution
pip install -e ".[dev]"
export HERMES_AGENT_REPO=~/.hermes/hermes-agent
python -m evolution.skills.evolve_skill --skill <skill-name> --iterations 10 --eval-source synthetic
```

GEPA can be skipped initially. It earns its keep when you hit a wall with a
skill and want trace-driven, test-gated improvement without the cost of
fine-tuning or reinforcement learning. It is also still maturing — treat it as
an advanced, somewhat experimental companion tool, and review every pull request
it produces. In one line: the runtime loop captures experience, the Curator
keeps the library clean, and GEPA verifies that what is in the library actually
works.

### Credential management

Running multiple agents across projects makes credential management a real
concern. `hermes auth` provides credential pools that hold multiple keys per
provider and rotate them automatically when one hits a rate limit or cooldown.

```bash
hermes auth                                   # interactive credential wizard
hermes auth list / status
hermes auth add openrouter --api-key sk-or-... # add an API key
hermes auth add anthropic --type oauth          # add an OAuth credential
```

For secrets that should not be stored in Hermes at all, the official
`1password` skill fetches credentials from 1Password at runtime
(`hermes skills install official/security/1password`). Plain keys in `.env` are
acceptable for a hobby setup; for production — multiple agents, key rotation, an
audit trail — use `hermes auth`.

### MCP server integrations

The Model Context Protocol connects Hermes to external systems — a database,
GitHub, anything with an API.

```bash
hermes mcp serve                  # run Hermes itself as an MCP server
hermes mcp add github --command "npx @modelcontextprotocol/server-github"
hermes mcp add <name> --url https://remote-mcp-endpoint
hermes mcp list / test / configure / remove
```

`--command` runs a local MCP server process; `--url` connects to a remote
endpoint. `hermes mcp configure` filters which of a server's tools Hermes
exposes. MCP servers are configured per-profile by design.

### Extending Hermes: plugins and event hooks

Two mechanisms let you extend Hermes without modifying its core.

**Plugins** add custom tools, hooks, and integrations. There are three plugin
types: general plugins (which contribute tools or hooks), memory providers (the
external memory backends of Chapter 3's Tier 3), and context engines
(alternative context-management strategies). Plugins are managed through the
interactive `hermes plugins` UI and live under `~/.hermes/plugins/`.

**Event hooks** run custom code at lifecycle points. They come in two kinds.
*Gateway hooks* fire around messaging activity and are the right place for
logging, alerting, and outbound webhooks. *Plugin hooks* fire around the agent's
tool calls and are the right place for tool interception, metrics, and
guardrails — for example, blocking or auditing a class of command before it
runs. Hooks live under `~/.hermes/hooks/`. Hooks plus webhooks are also how
inbound automation is wired: a webhook can trigger a Hermes run, which is the
basis for patterns such as an automated GitHub pull-request reviewer.

### Provider routing and fallback

Beyond choosing one model, Hermes gives fine-grained control over *which*
provider serves a request. **Provider routing** supports sorting, whitelists,
blacklists, and priority ordering so requests can be optimized for cost, speed,
or quality. **Fallback providers** add automatic failover: when the primary
model errors or is rate-limited, Hermes fails over to a backup, with independent
fallback for auxiliary tasks such as vision and context compression. Configure a
chain with `hermes fallback add` so an unattended job does not stall on a single
provider's outage. (Prompt caching, discussed in Chapter 8, is a separate,
always-on built-in: a cross-session one-hour prefix cache for Claude on the
native Anthropic, OpenRouter, and Nous Portal providers.)

### Using Hermes elsewhere: API server and IDE integration

Hermes is not confined to its own CLI and gateway.

**API server.** `hermes` can expose itself as an OpenAI-compatible HTTP
endpoint, so any frontend that speaks the OpenAI format — Open WebUI, LobeChat,
LibreChat, and others — can drive the full agent, tools and memory included.
This is the cleanest way to put a custom or shared UI in front of Hermes.

**IDE integration (ACP).** Through the Agent Client Protocol, Hermes runs inside
ACP-compatible editors including VS Code, Zed, and JetBrains IDEs. Chat, tool
activity, file diffs, and terminal commands render inside the editor, which
makes Hermes usable as a coding agent without leaving the development
environment — complementary to the Claude Code delegation pattern above.

### Voice mode and the web dashboard

`/voice` toggles real-time spoken interaction in the CLI, Telegram, and Discord,
including Discord voice-channel mode. `hermes dashboard` launches a
browser-based UI for managing configuration, keys, and sessions (requires
`pip install hermes-agent[web]`); it binds to localhost by default, and the
`--insecure` flag should be used only behind trusted network controls.

### Migrating from OpenClaw

A setup can be migrated from OpenClaw rather than rebuilt. `hermes claw migrate`
imports persona, memory, skills, providers, messaging tokens, and agent settings
— over 30 categories. The setup wizard also detects `~/.openclaw` on first run.

```bash
hermes claw migrate --dry-run                          # preview, write nothing
hermes claw migrate --preset full                      # all compatible settings, no secrets
hermes claw migrate --preset full --migrate-secrets    # include API keys
```

Secrets are migrated only with `--migrate-secrets`, and a restore-point snapshot
is written before anything is applied.

### Batch processing and research use

Hermes is built by a model-training lab and doubles as a research platform.
**Batch processing** runs the agent across hundreds or thousands of prompts in
parallel and emits structured, ShareGPT-format trajectory data — useful for
generating training data or for large-scale evaluation. The same trajectory
export feeds **reinforcement-learning training** via Nous Research's Atropos
framework. GEPA, above, is the prompt-and-skill-level counterpart that needs no
weight training. Most operators will not use the RL path directly, but batch
processing is a practical tool any time the same task must run over a large set
of inputs, and the research lineage explains why the harness is engineered as
carefully as it is.

---

## Chapter 8: Operational Lessons

The points below are drawn from the official Tips and Best Practices
documentation and from independent reviews of production deployments. None is in
a feature list; each changes how effectively Hermes performs.

### Prompt-cache economics

Most LLM providers cache the system-prompt prefix. When the system prompt stays
stable across a session — same model, same context files, same memory — every
message after the first benefits from a *cache hit*, substantially cheaper than
a cold read. The corollary is the lesson: do not change the model mid-session,
and do not churn context files, because either invalidates the cache. (This is
also why Tier 1 memory is a frozen snapshot — a mid-session memory write would
otherwise break the cache.) Switch models *between* sessions. `/usage` reports
spend within a session; `/insights` gives a 30-day view.

### Specify the goal, then delegate the steps

Two opposite failure modes occur with prompting. The vague prompt — "fix the
code" — produces a vague fix and several rounds of clarification; front-load
detail and paste tracebacks directly. The micromanaged prompt — dictating each
step — wastes the agent's actual strength; "find and fix the failing test" lets
it search, run, and iterate. Be specific about the *goal*; let the agent
determine the *steps*.

### Skills are created but not always used

Hermes generates skills, but the agent decides when to load them. It may judge a
skill unnecessary and skip it, or load it and use only part of it. A large
collection of auto-generated skills is therefore not equivalent to a faster
agent. Two habits address this: invoke skills that genuinely matter explicitly
with `/<skill-name>` rather than relying on the agent to reach for them, and
audit created skills periodically with `hermes skills list` and
`hermes curator run --dry-run`. The compounding benefit is real — agents with a
substantial set of self-created skills complete similar tasks markedly faster —
but only when the skills are sound and actually used.

### Self-improvement has no inherent ground truth

A self-improving agent improves toward whatever feedback signal it receives. In
domains with clear feedback — code that compiles or fails, tests that pass or
fail — the loop works. In ambiguous domains, or where the operator cannot judge
correctness, there is no reliable ground truth, and the agent can become faster
and more confident at the wrong thing. The agent also tends to rate its own
performance generously. Defenses: review the skills the Curator creates and
keeps; pin sound hand-authored skills (`hermes curator pin`) so they are not
silently degraded; and, for skills that matter, prefer GEPA's trace-driven,
test-gated optimization (Chapter 7) over the agent's self-assessment. Do not
assume "it learned" means "it learned the correct thing."

### Choose a deliberate loop position

A useful frame distinguishes three positions: *in the loop* (each step is
approved), *on the loop* (the operator supervises and intervenes), and *out of
the loop* (the agent runs unattended). Hermes's defaults place the operator on
the loop for outputs and out of the loop for the learning, and the path of least
resistance pulls toward fully out-of-the-loop. That is acceptable where feedback
is crisp and a real risk where it is not. Decide deliberately which position
each workflow warrants.

### Security for an agent with shell access

An agent that runs shell commands unattended needs a deliberate security
posture.

- **Keep dangerous-command approval enabled.** Hermes checks every command
  against a curated list of dangerous patterns. When it prompts, four choices
  appear: *once*, *session*, *always*, *deny*. Choose *always* with caution — it
  permanently allowlists the pattern. Begin with *session*.
- **Container backends skip those checks.** With Docker, Singularity, Modal, or
  Daytona, dangerous-command checks are disabled because the container is the
  security boundary — so the container image must itself be locked down.
- **Sandbox untrusted code.** When working with an unfamiliar repository, set
  `TERMINAL_BACKEND=docker` so a harmful command cannot reach the host.
- **Never set `GATEWAY_ALLOW_ALL_USERS=true` on a bot with terminal access.**
  Use per-platform allowlists (`TELEGRAM_ALLOWED_USERS`,
  `DISCORD_ALLOWED_USERS`) or DM pairing (`hermes pairing approve`).
- **Account for the skill and MCP supply chain.** Auto-created skills, community
  skills, and MCP servers all execute with the agent's privileges. Inspect
  skills before installing (`hermes skills inspect`), and do not point an
  unsandboxed Hermes instance at a payment or otherwise regulated codebase until
  its provenance, signing, and audit-trail story is mature.

The consensus from independent reviews is that Hermes is a strong always-on
personal agent for individual developers, indie builders, and researchers, but
is not yet suited to regulated backend engineering. Match the deployment to the
stakes.

### Choosing a model for the harness

Hermes is designed so a strong harness makes open or budget models perform at
operator grade, and in practice this largely holds. The practical pattern: a
frontier model (Claude Sonnet/Opus class, GPT class) for architecture and
difficult multi-step reasoning, a fast inexpensive model (Claude Haiku,
DeepSeek) for formatting and boilerplate. Switching is trivial, but the
prompt-cache lesson applies — switch between sessions. Configure a fallback
chain with `hermes fallback add` so a rate-limited primary does not stall an
unattended job.

### CLI reflexes worth building

`Ctrl+C` pressed once interrupts the agent so it can be redirected mid-thought.
`Ctrl+V` pastes a clipboard image directly for vision analysis. `Alt+Enter` or
`Ctrl+J` inserts a newline without sending. Typing `/` then `Tab` autocompletes
every command and installed skill. `/title` on every session worth finding again
prevents an indistinguishable pile of unnamed sessions.

---

## Appendix: Quick Reference

### Install and setup

```bash
# Install (Linux / macOS / WSL2 / Android-Termux)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Install (native Windows, PowerShell — early beta)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)

hermes setup            # configure (sections: model, terminal, gateway, tools, agent)
hermes doctor           # health check (add --fix to attempt repairs)
hermes status           # visual status overview
hermes dump             # plain-text setup summary for support requests
hermes update           # update (add --backup for a pre-update snapshot)
```

### ~/.hermes/ layout (key paths)

```text
config.yaml             non-secret configuration
.env                    API keys and secrets
SOUL.md                 agent identity (system-prompt slot #1)
memories/MEMORY.md      Tier 1 memory — agent facts (~2,200 chars)
memories/USER.md        Tier 1 memory — user model (~1,375 chars)
skills/                 all skills; .archive/ holds Curator-archived skills
state.db                SQLite session store, FTS5 — Tier 2 memory / search
cron/jobs.json          scheduled jobs
profiles/<name>/        isolated profiles, each a full Hermes home
logs/                   agent.log, gateway.log, errors.log
```

### Toolset and configuration rules

1. Toolset changes take effect only in a new session (`/new`).
2. Configuration changes require a restart (or `/restart` for the gateway).
3. Enable only the toolsets a task needs.

### Session commands

```bash
hermes chat -q "one-shot query"     # one-shot; shows tool output
hermes -z "scripted one-shot"       # final answer only — for scripts, cron, CI
hermes chat                         # interactive session (or just: hermes)
hermes --continue                   # resume the most recent session
hermes --resume <id-or-title>       # resume a specific session
hermes sessions list / browse / rename / prune / export / delete
```

### Key in-session slash commands

```text
/new  (alias /reset)     start a fresh session
/compress [focus]        compress context manually
/background <prompt>     run a prompt in a separate background session
/rollback [n]            list or restore filesystem checkpoints
/model [name]            switch among already-configured models
/skills                  search, install, and manage skills
/<skill-name>            load an installed skill (e.g. /python-testing)
/cron                    manage scheduled tasks (see cron forms below)
/sethome                 set the current chat as the home channel for deliveries
/title <name>            name the current session
/voice [on|off|status]   toggle voice mode
/usage                   token usage and cost for the session
/verbose                 cycle tool-output display modes
/help                    full command list
```

There is no `/skill` command. Load a skill with `/<skill-name>`; manage skills
with `/skills`.

### CLI keyboard shortcuts

```text
Ctrl+C (once)        interrupt the agent — then type to redirect
Ctrl+C (twice/2s)    force exit
Alt+Enter / Ctrl+J   newline without sending (works in every terminal)
Ctrl+V               paste a clipboard image
/  then  Tab         autocomplete commands and installed skills
```

### Context files and references

```text
~/.hermes/SOUL.md    instance-wide identity (system-prompt slot #1)
AGENTS.md            project root — rules and conventions, auto-loaded each session
.hermes.md CLAUDE.md also recognized as project context files
.cursorrules         read automatically if present in the working directory
@<path|folder|url>   inject a file, folder, git diff, or URL into one message
```

### Skill and Curator commands

```bash
hermes skills browse / search                  # explore registries
hermes skills install <id>                     # install a skill
hermes skills inspect <id>                     # preview without installing
hermes skills list / publish <path>
hermes skills tap add <user>/<repo>             # add a GitHub repo as a custom tap
hermes bundles create <name> --skill <id> ...   # group skills under one command
hermes curator run --dry-run                    # preview a Curator pass
hermes curator pin <skill>                      # protect a skill from archival
```

### Cron

```bash
# In-session (/cron add):
/cron add 30m "..."                # one-shot in 30 minutes
/cron add "every 2h" "..."         # recurring interval
/cron add "0 9 * * 1-5" "..."      # standard cron expression
/cron add "every 1h" "..." --skill <name>   # attach a skill

# From a shell:
hermes cron list / create / edit / pause / resume / run / remove / status
```

### Reliable messaging delivery (cron script pattern)

```bash
#!/bin/bash
TELEGRAM_BOT_TOKEN=$(grep TELEGRAM_BOT_TOKEN ~/.hermes/.env | cut -d= -f2)
TELEGRAM_CHAT_ID=$(grep TELEGRAM_CHAT_ID ~/.hermes/.env | cut -d= -f2)
RESULT=$(hermes -z "Your query here")
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
  -d "chat_id=${TELEGRAM_CHAT_ID}" -d "text=${RESULT}" -d "parse_mode=Markdown"
```

A home channel must also be set with `/sethome` for built-in delivery.

### Gateway and profile commands

```bash
hermes gateway setup / run / install / start / stop / restart / status / list
hermes profile list / create <name> [--clone] / use <name> / show / rename
hermes -p <name> <command>            # run any command under a specific profile
```

### Credentials, memory, MCP

```bash
hermes auth                           # credential pool wizard
hermes auth add <provider> --api-key <key> | --type oauth
hermes memory setup / status          # configure an external memory provider
hermes mcp serve / add / list / test / configure / remove
```

### Extensibility and integration

```bash
hermes plugins                        # manage plugins (tools, memory providers, context engines)
hermes fallback add <provider>        # add a fallback provider for failover
# Event hooks live under ~/.hermes/hooks/ (gateway hooks and plugin hooks)
# API server: expose Hermes as an OpenAI-compatible HTTP endpoint
# IDE (ACP): use Hermes inside VS Code, Zed, or JetBrains editors
```

### GEPA — offline skill optimization (companion repo)

```bash
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution && pip install -e ".[dev]"
export HERMES_AGENT_REPO=~/.hermes/hermes-agent
python -m evolution.skills.evolve_skill --skill <skill-name> --iterations 10 --eval-source synthetic
# Output: a pull request against the hermes-agent repo. Review before merging.
```

### Emergency recovery

```text
Hermes will not start:        hermes doctor --fix ; check ~/.hermes/logs/
Tool unavailable after enable: /new
Config change has no effect:   exit and relaunch (or /restart the gateway)
Cron job not firing:           hermes cron status ; hermes cron list
Gateway not responding:        hermes logs gateway -f ; check the bot token ;
                               on WSL run `hermes gateway run` inside tmux
A file edit went wrong:        /rollback
Kanban scratch files gone:     unrecoverable — always use --workspace dir:/abs/path
A skill was degraded:          hermes curator pin <skill> ; restore from .archive/
```

### Official resources

```text
Documentation     hermes-agent.nousresearch.com/docs
Source            github.com/NousResearch/hermes-agent
Self-evolution    github.com/NousResearch/hermes-agent-self-evolution
Skills hub        agentskills.io  /  skills.sh
LLM-readable docs /docs/llms.txt  and  /docs/llms-full.txt
```

---

*Built around Hermes Agent by Nous Research (MIT License). Verified against
official documentation and source repositories. Command flags, defaults, and
counts change between releases; confirm details against the current docs.*


---
*Source: [https://vlaicu.io/posts/hermes-agent-manual/](https://vlaicu.io/posts/hermes-agent-manual/)*
