The Operator’s Manual for Hermes Agent

Building an AI assistant that can act, remember, and improve

Operator’s Manual Β· Edition 3.2 Β· Verified against official Nous Research documentation


About This Manual

This manual explains how to deploy and operate Hermes Agent as a persistent “operator” β€” an AI system that runs continuously, uses tools, remembers context across sessions, and improves over time β€” rather than as a single-session chatbot. It covers architecture, installation, the core mental model, day-to-day workflows, the operator loop, common failure modes, advanced configuration (including offline skill optimization with GEPA), and a distilled set of operational lessons.

Hermes Agent moves quickly. It was first released publicly on 25 February 2026 and has shipped frequent updates since. Exact command flags, default values, tool counts, and bundled-skill counts change between releases. This edition was checked against the official documentation at hermes-agent.nousresearch.com/docs, the GitHub repositories (github.com/NousResearch/hermes-agent and github.com/NousResearch/hermes-agent-self-evolution), and the project site. When a specific number or flag matters, verify it against the current docs or with --help.

How to use this manual

The manual is organized so that any topic can be found and re-read quickly:

  • Chapters 1–3 are conceptual β€” architecture and the mental model. Read these once; they make every later feature predictable.
  • Chapters 4–5 are operational β€” workflows and how to run Hermes continuously.
  • Chapter 6 is a failure-mode index β€” scan it when something breaks.
  • Chapters 7–8 are advanced configuration and distilled lessons.
  • The Appendix is a pure command reference.

Some material deliberately appears in more than one place: a concept is explained once (in Chapters 1–3), then referenced where it is applied (Chapters 4–8) and listed for quick lookup (Chapter 6 and the Appendix). That is reference-manual design, not accidental duplication.

Corrections incorporated in this edition

Earlier informal guides to Hermes Agent contained inaccuracies. They are corrected here and listed so the differences are explicit:

  • There is no “Hermes Vault” feature. Credential management is hermes auth (credential pools with same-provider key rotation) plus the official 1password skill. See Chapter 7.
  • There is no /skill command. An installed skill is loaded by typing /<skill-name> directly. /skills is a separate command for searching, installing, and managing skills.
  • There are six terminal backends: local, docker, ssh, daytona, singularity, and modal. Daytona and Modal are the serverless options.
  • Built-in memory is file-based (MEMORY.md and USER.md). It is one of three memory tiers; see Chapter 3.
  • Skills are auto-generated, not only hand-written. Hermes creates skills from experience, refines them with the skill_manage tool, maintains them with a background Curator, and can optimize them offline with GEPA.
  • hermes tools enable NAME is not a documented subcommand. Toolsets are managed through hermes tools, hermes setup tools, or the -t/--toolsets flag.

Conventions

This manual addresses the reader directly (“you”, “your”). ~/.hermes/ is the Hermes home directory; /home/user/... stands in for an absolute path that should be replaced with a real one. Commands in code blocks are run from a shell unless marked as in-session slash commands.


Introduction: The Operator Model

The common way to use a large language model is as a chat box: a prompt is pasted in, output is copied out. This works for lookups and drafts, but it has a structural ceiling. The model holds no context between conversations, the human carries all continuity, and every session starts cold. The limit is not prompt quality β€” it is the interaction model.

There is a second model. Instead of asking a model questions, you delegate work to an operator: a system that runs continuously, remembers what it has learned, uses tools to act, schedules its own follow-up work, and verifies its own results. The difference is the difference between looking up driving directions and employing an assistant who already knows the route, notices when conditions change, and adjusts the plan unprompted. One is a lookup. One is a delegation.

Hermes Agent is built for the second model. Its one-line pitch is an agent that gets better the longer you use it. What makes that real is that three capabilities usually found in separate tools sit in one framework: runtime skill learning, persistent multi-layer memory, and an optional offline optimization pipeline. The shift that matters is conceptual β€” once a task is set up, it runs, and the human’s role moves from executing to operating: supervising, verifying, and intervening only where a human decision is genuinely required.


Chapter 1: What Hermes Agent Is and How It Is Built

The pitch

Hermes Agent is an open-source, self-improving AI agent framework built by Nous Research, released in February 2026 under the MIT license. It runs on Linux, macOS, WSL2, native Windows (early beta), and Android via Termux. It connects to almost any LLM provider, exposes dozens of built-in tools, and β€” the property that distinguishes it β€” it learns: it creates reusable skills from experience, refines them as it uses them, remembers facts across sessions, and can search its own conversation history.

Hermes is not tied to a single machine. It can run on a low-cost VPS, a GPU server, or serverless infrastructure (Daytona, Modal) that hibernates when idle and costs almost nothing between tasks. It can be operated from a terminal or from any of 20+ messaging platforms.

Architecture

Understanding the structure makes every later feature predictable.

Everything flows through a single core agent class (an AIAgent in a run_agent.py script). The CLI, the messaging gateway, IDE integration, the batch runner, and an API server are all entry points into that same core β€” which is what makes the platform-agnostic story work in practice.

   Entry points                  Core agent                 Backends
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ CLI          │──┐      β”‚  AIAgent            │──┬──> β”‚ Session storage  β”‚
 β”‚ Gateway      │───      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚    β”‚ (SQLite + FTS5)  β”‚
 β”‚ IDE (ACP)    │──┼────> β”‚  β”‚ Prompt builderβ”‚  β”‚  β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚ Batch runner │───      β”‚  β”‚ Provider res. β”‚  β”‚  β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ API server   β”‚β”€β”€β”˜      β”‚  β”‚ Tool dispatch β”‚  β”‚  └──> β”‚ Tool backends:   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚  β”‚ Compression   β”‚  β”‚       β”‚ terminal, web,   β”‚
                          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚ browser, file,   β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚ MCP, vision, TTS β”‚
                                                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The core loop is ReAct-style and synchronous: build the system prompt, check whether context compression is needed, make an interruptible API call, execute any tool calls the model requested, and loop. Four details matter operationally:

  • Six execution backends. The agent can run commands locally, in Docker, on a remote host over SSH, or in a Modal, Daytona, or Singularity sandbox β€” the same code, changed with one config setting. Execution can be moved from a laptop to a cloud server without touching anything else.
  • Provider translation. A translation layer routes any provider through one of a small number of API formats, which is why the active model can be swapped β€” Claude, GPT, Gemini, a local Ollama model β€” with one command and nothing else breaking.
  • A per-task turn cap (90 by default). Each task has a hard ceiling on the number of reasoning/tool turns. Without it, an agent stuck in a loop β€” retrying a failing API, re-reading the same file β€” would silently consume credits. Sub-agents spawned by delegation share the same budget, so a runaway delegation chain cannot bypass the cap. The ceiling is configurable in the setup wizard.
  • Context compression. When a session approaches the model’s context-window limit, the loop compresses history automatically. Compression can also be triggered manually; see Chapter 5.

Where Hermes fits: the comparison with OpenClaw

Hermes is not primarily a coding copilot tied to an editor. Its closest peer in the open ecosystem is the personal-agent project OpenClaw. Both are persistent and messaging-friendly, but they make opposite architectural choices. A frequently quoted framing: Hermes packages a gateway around a learning agent; OpenClaw packages an agent around a messaging gateway.

DimensionOpenClawHermes Agent
ArchitectureGateway-first; the agent is attached to the messaging layerAgent-first; the gateway is one entry point into a learning runtime
Channel breadthVery broad (50+ messaging channels)Focused (20+ channels, the most-used ones)
Skill ecosystemVery large community skill pool~120 skills bundled, plus the Skills Hub and GitHub taps
Learning loopSkills are staticSkills self-evolve; the Curator prunes; GEPA optimizes offline
MemoryPlain markdown filesThree tiers: bounded markdown, FTS5 search, pluggable external providers
Security postureGateway-first design and a large unvetted skill pool have been associated with publicized incidents in 2026Snapshot-before-write for file operations and a curated skill set reduce some surface

Treat the security row as point-in-time and directional, not as a current audit β€” verify present advisories for both projects before relying on either in a sensitive context (see Chapter 8). Setups can be migrated directly from OpenClaw; see Chapter 7.


Chapter 2: Installation and First Run

A working installation, configured and running a real task, takes roughly 30 minutes including troubleshooting. Requirements: Linux, macOS, or WSL2 (native Windows and Android/Termux are also supported); Python 3.11+, which the installer provides; and around 8 GB of RAM for ordinary API-based use.

Install

Linux, macOS, or WSL2:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or ~/.zshrc

Native Windows (PowerShell, early beta):

iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)

Android (Termux): use the same curl one-liner as Linux; the installer detects Termux automatically.

For an existing installation, hermes update pulls the latest version. Automatic pre-update backups are off by default β€” add --backup for a snapshot, or set update.backup: true in config.yaml. hermes update --check reports whether a newer version exists without installing it.

Setup wizard

hermes setup

The first run launches an interactive wizard with five sections:

  1. Model β€” choose an LLM provider and enter an API key, or run an OAuth flow.
  2. Terminal β€” choose an execution backend (local, docker, ssh, daytona, singularity, modal).
  3. Gateway β€” optionally configure messaging platforms.
  4. Tools β€” enable or disable toolsets.
  5. Agent β€” behavior settings such as the per-task turn cap and compression thresholds.

Configuration is written to ~/.hermes/config.yaml; API keys are stored separately in ~/.hermes/.env and never in config.yaml. Any single section can be re-run later, e.g. hermes setup model.

hermes model runs the full provider-setup wizard from a shell (it can add a provider or run OAuth). The in-session /model command switches only among already-configured providers. Model identifier strings change as providers release new models; treat any specific string as an example.

Health check

hermes doctor          # checks dependencies, config, credentials, directories
hermes doctor --fix    # attempts automatic repairs
hermes status          # visual overview of agent, auth, and platform state
hermes dump            # plain-text summary for a support request

Run hermes doctor before any significant work.

Two rules that prevent most early confusion. Toolset changes take effect only in a new session β€” start one with /new. Configuration edits are read once at startup and cached β€” after editing config.yaml directly, exit and relaunch (or /restart the gateway). Prefer hermes config set <key> <value> or hermes config edit over editing the YAML blindly.

First task

hermes chat -q "List the contents of ~/projects and tell me what kind of projects you see there"

This returns a directory listing and a short summary β€” a first real agent action: it read the filesystem with its file tool, reasoned about the result, and replied. To start an interactive session, run hermes chat (or just hermes).

Expect early friction: first-time setup commonly includes an API key error or provider timeout. Wait about 60 seconds and retry; if it persists, verify the key in ~/.hermes/.env.

What lives in ~/.hermes/

Most day-to-day work touches one of these paths. Knowing the layout makes the rest of the manual concrete.

~/.hermes/
β”œβ”€β”€ config.yaml        # main configuration (non-secret)
β”œβ”€β”€ .env               # API keys, bot tokens, secrets
β”œβ”€β”€ auth.json          # OAuth provider credentials
β”œβ”€β”€ SOUL.md            # agent identity β€” slot #1 in the system prompt
β”‚
β”œβ”€β”€ memories/
β”‚   β”œβ”€β”€ MEMORY.md      # persistent agent facts (Tier 1 memory)
β”‚   └── USER.md        # the agent's model of you (Tier 1 memory)
β”‚
β”œβ”€β”€ skills/            # all skills β€” bundled, hub-installed, agent-created
β”‚   β”œβ”€β”€ <category>/<skill-name>/SKILL.md
β”‚   β”œβ”€β”€ .archive/      # skills the Curator has archived (recoverable)
β”‚   └── .hub/          # Skills Hub state
β”‚
β”œβ”€β”€ sessions/          # per-platform session metadata
β”œβ”€β”€ state.db           # SQLite session store, FTS5-indexed (Tier 2 memory)
β”œβ”€β”€ cron/
β”‚   β”œβ”€β”€ jobs.json      # scheduled jobs
β”‚   └── output/        # cron run outputs
β”‚
β”œβ”€β”€ profiles/<name>/   # isolated profiles, each a full Hermes home
β”œβ”€β”€ plugins/           # custom plugins
β”œβ”€β”€ hooks/             # lifecycle hooks
β”œβ”€β”€ skins/             # CLI themes
└── logs/              # agent.log, gateway.log, errors.log

Most of this is never edited by hand. The files worth knowing: config.yaml (source of truth for everything non-secret), .env (secrets β€” Hermes routes secret-looking values here automatically), SOUL.md (identity; Chapter 3), skills/ (where the entire learning loop lives), and state.db (the FTS5-indexed database that makes “what did we discuss three weeks ago?” work).


Chapter 3: The Mental Model

Hermes becomes predictable once a small set of concepts is understood. They are Tools, Identity, Skills, Context files, Memory, Sessions, Cron, and Gateway, tied together by the Learning Loop. A useful framing to hold throughout: memory is what the agent knows; skills are how it does things; identity is who it is.

Toolsets: what Hermes can do

A tool is a single capability (run a shell command, search the web). A toolset is a named group of related tools. Hermes ships dozens of tools β€” the exact count grows with releases β€” organized into toolsets. The capabilities most operators rely on:

CapabilityWhat it does
webWeb search and content extraction
terminalRun shell commands, manage processes
fileRead, write, search, and patch files
browserBrowser automation (local Chrome over CDP, or a cloud browser)
code executionSandboxed Python via execute_code, including Programmatic Tool Calling
visionAnalyze images
image generationGenerate images
tts / voiceText-to-speech and real-time voice
delegationSpawn isolated sub-agents for parallel work
cronSchedule tasks on a timeline
memoryPersistent cross-session facts
session searchFull-text search over past conversations
skillsBrowse, install, and load skills
messagingSend messages across platforms
kanbanDrive the multi-agent collaboration board
computer useDrive a desktop GUI (macOS, via the cua-driver backend)

Toolsets are configured through the interactive hermes tools UI, the hermes setup tools section, or per run with -t/--toolsets (e.g. hermes chat --toolsets web,terminal,skills). The authoritative list is in the Toolsets Reference in the official docs. Enable only what a task needs β€” a smaller tool schema produces cleaner agent behavior.

Identity: SOUL.md

Above memory and skills sits a layer that determines who the agent is when it shows up: identity. Without it, every agent feels like the same agent wearing different hats.

Identity is a single hand-authored file, ~/.hermes/SOUL.md. It occupies the first slot in the system prompt, before anything else loads, and defines the agent’s personality, tone, communication style, and hard limits.

# SOUL.md
You are a pragmatic senior engineer with strong taste.
You optimize for truth, clarity, and usefulness over politeness theater.

SOUL.md is static β€” written once, tweaked occasionally β€” and stays consistent across every project and session. If the file is missing, Hermes falls back to a built-in default identity (and now seeds a starter SOUL.md automatically). Identity matters to the self-improving story because everything that follows β€” the memory the agent writes, the skills it creates, the way it consolidates knowledge β€” happens through the lens of this file. SOUL.md is the fixed frame; memory and skills are the moving parts inside it. For a temporary change of register without editing the file, /personality swaps in a built-in or custom personality preset for the current session only.

Skills: procedural memory the agent writes itself

If memory holds facts, skills hold procedures β€” not what the agent knows but how it does things. A skill is a markdown file with YAML frontmatter, stored under ~/.hermes/skills/.

---
name: k8s-pod-debug
description: >
  Activate for crashing pods, CrashLoopBackOff,
  "why is my pod restarting", container failures.
version: 1.2.0
author: agent
platforms: [linux, macos]
---
## Procedure
1. Get pod status, check events, pull logs
2. Look for OOMKilled, ImagePullBackOff, config errors

## Pitfalls
- Forgetting the --previous flag on restarted containers

## Verification
- Pod stays Running with 0 restarts for 5+ minutes

Progressive disclosure. To keep token cost low, skills load in three levels:

  • Level 0 β€” the agent sees only names and descriptions (roughly 100 tokens per skill; about 3k tokens for the full catalog). This is all that loads at session start.
  • Level 1 β€” the full skill body, loaded on demand when a skill’s triggers match the task.
  • Level 2 β€” specific reference files inside a skill, opened only when the agent needs that depth.

The result: the agent pays in tokens only for the skills it actually uses.

Self-evolution. This is the core differentiator. The agent creates its own skills autonomously using the skill_manage tool. Creation triggers when the agent completes a complex task (roughly five or more tool calls), recovers from errors and finds the working path, is corrected by the operator, or discovers a non-trivial workflow. The loop: encounter a problem, solve it through trial and error, save the working approach as a SKILL.md file, and β€” next time a similar problem appears β€” load that skill and follow the proven procedure instead of rediscovering it. One-time discoveries become permanent procedural memory.

The skill_manage tool supports six actions: create, patch (a targeted fix β€” preferred, because it is token-efficient), edit (a full rewrite), delete, write_file, and remove_file.

Skills can also be written by hand and shared. Browse and install community skills with hermes skills browse and hermes skills install; publish with hermes skills publish; group several under one slash command with hermes bundles. Any GitHub repository can be added as a custom tap:

hermes skills tap add yourname/your-skills-repo
hermes skills install yourname/your-skills-repo/<skill-name>

About 120 skills ship bundled; the Skills Hub (skills.sh / agentskills.io) and community taps add many more. Counts grow with every release.

The Curator: maintenance for the skill library

Without maintenance, agent-created skills accumulate into dozens of narrow, overlapping playbooks that waste tokens. The Curator is a background maintenance system that prevents this.

It runs on an inactivity check, not a cron daemon: when at least 7 days have passed since its last run and the agent has been idle for 2 or more hours, a background fork of the agent spins up with its own prompt cache, never touching the active conversation. It operates in two phases:

  • Automatic transitions (deterministic, no LLM): a skill unused for 30 days becomes stale; unused for 90 days, it is archived.
  • LLM review (up to 8 iterations): the forked agent surveys all agent-created skills and decides, per skill, whether to keep, patch, consolidate, or archive.
 Active  ──30d unused──>  Stale  ──90d unused──>  Archived  ──> Restored
   β–²  (deterministic, no LLM)          (moved to .archive/)   (one command,
   └──────────────── re-activated on use ────────────────────  reversible)

Two constraints make the Curator safe: it never touches bundled or hub-installed skills (only agent-authored ones), and it never auto-deletes β€” the worst outcome is archival to ~/.hermes/skills/.archive/, recoverable with one command. Before every pass it takes a tar.gz snapshot of the entire skills directory, and rollbacks are themselves reversible. Critical skills can be protected with hermes curator pin <skill>; patches and edits still apply to a pinned skill, so the agent can improve it without it being unpinned first. hermes curator run --dry-run previews a pass at any time.

Context files: AGENTS.md and friends

Distinct from skills (loaded on demand) are context files (loaded automatically, every session). SOUL.md, covered above, is one. Hermes also discovers and loads, from the current working directory:

  • AGENTS.md β€” a project’s standing rules. Placed in a project root, it holds architecture, conventions, and instructions (“FastAPI backend with SQLAlchemy”, “always use async for database operations”, “never commit .env”). Hermes loads the top-level AGENTS.md at session start; subdirectory AGENTS.md files are discovered lazily during tool calls.
  • .hermes.md and CLAUDE.md β€” also recognized as project context files, so an existing CLAUDE.md written for Claude Code is picked up without conversion.
  • .cursorrules β€” a .cursorrules or .cursor/rules/*.mdc file is read automatically, so existing coding conventions need not be duplicated.

The division: SOUL.md defines who Hermes is; AGENTS.md (and the others) define a project’s rules; memory holds facts about you; skills hold procedures. Keep context files concise β€” every character is injected into every message and counts against the token budget. --ignore-rules skips all context files, memory, and preloaded skills for a clean-room run.

Context references: pulling content in with @

Separate from the always-loaded context files, context references inject content into a single message on demand. Typing @ followed by a reference β€” a file path, a folder, a git diff, or a URL β€” expands it inline: Hermes appends the referenced content to the message automatically. This is the precise tool for “look at this” without restructuring a project’s AGENTS.md or pasting a file by hand. Use context files for what should always be loaded; use @ references for what matters only to the message in front of you.

Memory: three tiers, three speeds

Hermes does not have a single memory. It has three layers, each for a different purpose. The agent picks the right one for the question.

Tier 1 β€” in-prompt memory. Two small markdown files in ~/.hermes/memories/: MEMORY.md (the agent’s notes on your environment, conventions, tool quirks, and lessons learned β€” about 2,200 characters) and USER.md (your profile: name, communication preferences, skill level, things to avoid β€” about 1,375 characters). Both are injected into the system prompt as a frozen snapshot at session start: a memory written mid-session persists to disk immediately but does not appear in the prompt until the next session. When a file approaches capacity (about 80%, shown as a percentage in the system-prompt header), the agent consolidates β€” merging related entries into denser versions so only useful information survives. You can assist: “clean up your memory”, “replace the old Python 3.9 note β€” we are on 3.12 now”. Because Tier 1 is plain markdown, it can be audited directly by opening the files. Speed: instant. Capacity: tiny.

Tier 2 β€” session search. Every conversation, from the CLI and from messaging platforms, is stored in state.db (SQLite, FTS5-indexed). The agent can search weeks of past conversations with the session_search tool, summarizing matches with an LLM. Speed: on demand. Capacity: effectively unlimited. This is the same store that powers session resume (see Sessions, below).

Tier 3 β€” external memory providers. For a deeper, persistent model of you, Hermes ships pluggable providers configured with hermes memory setup. Only one external provider is active at a time, and it runs alongside Tier 1 rather than replacing it. Available providers, each with different storage, cost, and dependency trade-offs, include Honcho (dialectic user modeling), OpenViking (self-hosted, filesystem hierarchy), Mem0 (server-side LLM extraction), Hindsight (knowledge graph), Holographic (local, no dependencies), RetainDB (delta compression), ByteRover, and SuperMemory (context fencing). When an external provider is active, Hermes prefetches relevant memories before each turn, syncs conversation turns after each response, and extracts memories at session end. hermes memory status shows current state.

The trade-off across tiers is the design: critical facts live in Tier 1, always in context but bounded; everything else is searchable on demand in Tier 2; Tier 3 adds a deeper model at the cost of an external dependency. Use Tier 1 for durable facts (“always run X from directory Y”), not for session-specific context (“working on X today”), which belongs in the session itself.

Sessions: the conversation thread

Each chat is a session β€” resumable, nameable, and searchable.

hermes --resume <id-or-title>   # resume a specific session
hermes --continue               # resume the most recent session
hermes --continue <name>        # resume the most recent matching a title

Sessions are managed with hermes sessions (list, browse, rename, prune, export, delete). Naming sessions with /title keeps them findable; unnamed sessions become an indistinguishable pile within a week. Sessions provide continuity and an audit trail, and β€” as Tier 2 above β€” are searchable.

Cron: scheduled execution

Cron runs tasks on a schedule. The gateway daemon ticks every 60 seconds, runs any due jobs in isolated sessions, and delivers output to a messaging platform. Jobs survive restarts; they live in ~/.hermes/cron/jobs.json and output goes to ~/.hermes/cron/output/.

You do not have to write cron expressions β€” a job can be described in plain English and Hermes converts it. The in-session /cron command also accepts explicit forms:

/cron add 30m "Remind me to check the build"          # one-shot, runs once in 30 minutes
/cron add "every 2h" "Check server status"            # recurring interval
/cron add "0 9 * * 1-5" "..."                          # standard cron expression β€” weekdays 09:00
/cron add "every 1h" "Summarize new items" --skill blogwatcher   # load a skill before running

From a shell, jobs are managed with hermes cron list / create / edit / pause / resume / run / remove / status. Jobs can be chained: one job’s output becomes the next job’s input via a context_from flag β€” useful for multi-stage automations (a research step feeding a writing step).

Delivery needs a destination. Run /sethome in a Telegram or Discord chat to mark it as the home channel for proactive output; without one, the agent has nowhere to send scheduled results. If a job reports success but no message arrives β€” a failure mode observed with jobs producing structured output β€” verify the home channel first, then consult the cron-troubleshooting guide; the reliable fallback is to have the job’s script post to the platform API directly (see Chapter 4). A job that should not fire until reviewed can be created and immediately paused.

Gateway: Hermes in messaging platforms

The Gateway runs Hermes as a service inside messaging platforms β€” the CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS, Microsoft Teams, and others, 20+ in all.

hermes gateway setup     # interactive platform configuration
hermes gateway run       # run in the foreground (recommended for WSL, Docker, Termux)
hermes gateway install   # install as a background service (standard Linux, macOS)
hermes gateway start / stop / restart / status / list

Per-platform setup differs: Telegram needs a bot token from @BotFather (and your numeric user ID, available from @userinfobot, for the allowlist); Discord needs a bot token with the Message Content Intent enabled; Slack uses an app manifest (hermes slack manifest). Gateway activity is logged to ~/.hermes/logs/gateway.log.

WSL note. WSL’s systemd support is unreliable, so on WSL do not start the gateway as a background service. Run it in the foreground inside tmux so it survives the terminal closing: tmux new -s hermes 'hermes gateway run'.

The Learning Loop: how it fits together

The concepts above feed one another. The agent acts, captures what worked as skills, persists durable facts to memory, and the Curator keeps the skill set clean β€” so the agent is measurably more capable after months of use than on day one:

   act ──> notice a reusable pattern ──> skill_manage creates/patches a skill
    β–²                                                β”‚
    β”‚                                                β–Ό
 remember facts <── self-prompt to persist <── Curator prunes & consolidates
                                                     β”‚
                                            (GEPA optimizes offline β€” Ch. 7)

In one sentence: SOUL.md sets the identity, the runtime loop captures experience, the Curator keeps the library clean, and GEPA makes sure what is in the library actually works.

Summary of concepts

  • Tools β€” what Hermes can do (capabilities)
  • Identity β€” who Hermes is (SOUL.md, slot #1 in the system prompt)
  • Skills β€” how Hermes does things (procedural memory, partly self-authored)
  • Context files β€” a project’s standing rules (AGENTS.md, .cursorrules)
  • Memory β€” what Hermes knows (three tiers: in-prompt, session search, external)
  • Sessions β€” what was being worked on (history and continuity)
  • Cron β€” what runs automatically (scheduling)
  • Gateway β€” where Hermes can be reached (access)
  • Learning Loop β€” how all of the above compound over time

Any new Hermes feature can be placed against this list, which makes the system predictable rather than a set of features to memorize.


Chapter 4: Core Workflows

Four representative workflows, with example prompts and expected behavior.

Workflow A: Research pipeline

Goal: turn an open question into a structured reference file.

Example prompt:

Research the current state of GGUF quantization for local LLM inference.
Find: (1) what tools support GGUF (2) performance benchmarks vs full precision
(3) any recent updates. Write a structured summary to
~/research/gguf-state-$(date +%Y%m%d).md with sources.

Hermes activates web tools, searches, reads full pages rather than snippets, synthesizes across them, writes a structured markdown file, and lists sources. Because the output is a file, the task can be started and left to complete.

Workflow B: Repository debugging

Goal: hand a broken repository to Hermes and receive a fix or a clear explanation of what must change.

Example prompt:

There's a test failure in ~/projects/kiln. Run pytest and tell me what's
failing and why. If it's a simple fix, apply it and rerun to confirm.
Report what the problem was and what you changed.

Hermes changes into the directory, runs the tests, reads the failing test and relevant source, identifies the root cause, applies a fix, and re-runs to confirm. For obvious defects a one-shot query suffices; for complex failures, an interactive session lets you walk the stack trace with the agent. For risky edits, add --checkpoints so files can be restored with /rollback (Chapter 5).

Workflow C: Scheduled daily briefing

Goal: have Hermes run a recurring task unattended and deliver the result.

Create a cron job with a schedule and a prompt (see Chapter 3 for the /cron add forms). For reliable delivery, the job can run a script that posts directly to the platform API. The example below uses Telegram:

#!/bin/bash
# Daily content radar β€” posts a briefing to Telegram
TELEGRAM_BOT_TOKEN=$(grep TELEGRAM_BOT_TOKEN ~/.hermes/.env | cut -d= -f2)
TELEGRAM_CHAT_ID=$(grep TELEGRAM_CHAT_ID ~/.hermes/.env | cut -d= -f2)

BRIEFING=$(hermes -z "Run a content radar: find 3-5 interesting posts about AI
agents, local model setups, or Hermes workflows from the past 24 hours.
Format as a numbered briefing with links.")

curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
  -d "chat_id=${TELEGRAM_CHAT_ID}" \
  -d "text=${BRIEFING}" \
  -d "parse_mode=Markdown"

Two details: hermes -z is the scripted one-shot entry point β€” a single prompt in, the final answer out, nothing else on stdout β€” which suits cron, CI, and parent scripts. And the script posts to the API directly: a direct call fails loudly (a non-zero exit code) if it fails, whereas built-in delivery has been observed to report success while the message did not arrive. Set up once, this runs unattended on its schedule.

Workflow D: Multi-agent Kanban

Goal: split a large task across specialist agents, run them in parallel, and synthesize.

Hermes’s Kanban is a multi-profile collaboration board. A parent task holds the synthesis; child tasks hold parallel workstreams, each assigned to a specialist profile; children are linked to the parent so the parent runs only after they complete.

# Parent synthesis task
hermes kanban create "Write comprehensive X review" --assignee writer

# Child tasks β€” note --workspace dir:<absolute path> on every one
hermes kanban create "Research X market landscape" \
  --assignee researcher --parent <parent-id> \
  --workspace dir:/home/user/research/x-market

hermes kanban create "Technical analysis of X architecture" \
  --assignee engineer --parent <parent-id> \
  --workspace dir:/home/user/code/x-analysis

# Link children to the parent, then run one dispatcher pass
hermes kanban link <parent-id> <child-a-id>
hermes kanban link <parent-id> <child-b-id>
hermes kanban dispatch

Workspaces are critical. Every task that produces files to keep must use --workspace dir:/absolute/path. The default scratch workspace is garbage-collected when a task is archived; a dir: workspace persists.

Hermes can also decompose a task automatically: a coarse task placed in the triage column is fanned out by hermes kanban decompose (on by default per dispatcher tick) into child tasks routed to specialist profiles. Keep individual tasks small enough to complete in a single agent run β€” a task needing hundreds of turns will hit the turn cap, crash, and be re-queued.


Chapter 5: The Operator Loop

Running Hermes continuously requires an operational model. The principle: an operator sets up loops, verifies outputs, and intervenes only when a human decision is required β€” not one who watches the agent work.

Every operator session moves through four states:

Active ──> Waiting ──> Recovering ──> Done ──> (back to Active)
  • Active β€” Hermes is working: running commands, searching, reading or writing files, generating output.
  • Waiting β€” Hermes hit a blocking call (an LLM request, a tool waiting on an external resource, a rate limit). It resumes on its own.
  • Recovering β€” something failed. Hermes retries, summarizes context, or restarts a step β€” or flags the operator and waits.
  • Done β€” the task is complete, output delivered, the session saved.

A one-shot query passes through Active β†’ Waiting β†’ Done in seconds. A complex project cycles through these states over hours, bounded by the per-task turn cap (90 by default; Chapter 1) β€” a runaway loop terminates rather than silently consuming credits.

Long-running sessions

For tasks lasting more than a few minutes, use an interactive session, which keeps context alive across turns. For tasks lasting hours, run them in the background: /background <prompt> (aliases /bg, /btw) runs the prompt in a separate session and reports results when finished.

Context management

Every LLM has a finite context window. As a conversation grows, the model degrades β€” it repeats itself, loses earlier detail, stops noticing the obvious β€” without announcing it.

  • Under roughly two hours of active conversation: built-in compression triggers automatically near the limit. It is conservative, so do not rely on it immediately before a critical decision.
  • Over roughly two hours: run /compress deliberately, at a natural stopping point. Compression summarizes the history and replaces it; the original detail is lost, so compress at a milestone, not at a crisis. /compress <focus topic> narrows what is preserved. /usage shows where the session stands.
  • Multi-day projects: do not keep one session alive for days. End each session by writing a checkpoint file (/background Write current progress and next steps to ~/checkpoint.md) and read it back next session. A human-readable checkpoint survives any crash; a compressed summary of hundreds of exchanges is still lossy.

Filesystem checkpoints and rollback

For file-editing work, starting a session with --checkpoints causes Hermes to snapshot files before destructive changes. In-session, /rollback lists and restores those snapshots; hermes checkpoints manages the store. This is an undo mechanism for autonomous edits, independent of git.

Verifying output without supervising continuously

hermes chat -q "Check ~/logs/test-results.md and tell me if all tests passed"
hermes chat -q "What's the current status of the content pipeline? Any failures?"
hermes --continue   # resume a session left mid-task

When to intervene

Intervene when Hermes asks a question it genuinely cannot resolve; output is visibly wrong and the agent is not self-correcting; the decision is creative or strategic; or a task has been stuck in Recovering for more than about five minutes.

Do not intervene when the agent is processing (let the turn finish), is Waiting (an API call or rate limit is in flight), or is recovering and making progress (allow one full attempt). Reading output as it streams, rather than waiting for completion, is supervision in name only.


Chapter 6: Common Failure Modes

A failure-mode index. Each entry gives cause, fix, and prevention.

1. Context overflow. Symptom: the agent drifts, repeats itself, loses earlier context. Cause: the context window is near its limit. Fix: /compress, or /new and reload context from a file or skill. Prevention: avoid multi-hour sessions; use file-based checkpoints.

2. Toolset mismatch. Symptom: a toolset was enabled but the agent says it cannot use the capability. Cause: toolsets load at session start. Fix and prevention: run /new after any toolset change.

3. Configuration drift. Symptom: a config.yaml edit is ignored, or Hermes crashes on startup. Cause: configuration is read once and cached. Fix: exit and relaunch (or /restart the gateway). Prevention: use hermes config set or hermes config edit.

4. Cron delivery surprises. Symptom: a cron job reports success but no message arrives. Cause: built-in delivery can fail quietly for jobs emitting structured output, and has no destination if no home channel is set. Fix: set a home channel with /sethome; for script-backed jobs, post to the platform API directly. Prevention: use a home channel and direct API delivery.

5. Scratch workspace data loss. Symptom: a Kanban task completed but its output files are gone. Cause: the default scratch workspace is garbage-collected when a task is archived. Fix: none β€” recreate the work. Prevention: always create output-producing tasks with --workspace dir:/absolute/path.

6. Delegation or provider mismatch. Symptom: sub-agents fail immediately with “model not supported”. Cause: the delegation model in config.yaml does not match what the current provider can serve. Fix: align them, then restart. Prevention: after changing the main provider, check the delegation config.

7. WSL gateway stops on terminal close. Cause: WSL’s systemd support is unreliable. Fix and prevention: run the gateway in the foreground inside tmux (tmux new -s hermes 'hermes gateway run').

8. Profile name mismatch. Symptom: a Kanban task assigned to a profile is never picked up. Cause: hermes kanban assign can fail to apply if the profile name does not exactly match. Fix: verify with hermes kanban show <task-id>. Prevention: copy the exact name from hermes profile list.

9. Shared bot token across profiles. Symptom: messaging breaks when multiple profiles are connected. Cause: a messaging platform allows only one connection per bot token. Fix and prevention: give every profile its own bot and token (Chapter 7).

10. Credentials stored in plaintext. Acceptable for a hobby setup, a liability for production. Fix: use hermes auth credential pools and the 1password skill rather than scattering keys across .env files.

11. The agent overwrites a hand-tuned skill with a worse version. Cause: the same self-evolution mechanism that improves skills can degrade a manually customized one. Fix and prevention: pin important hand-authored skills with hermes curator pin, review what the Curator changes, and use GEPA (Chapter 7) when you want trace-driven, test-gated improvement rather than the agent’s own judgment. See also Chapter 8.


Chapter 7: Advanced Configuration

The capabilities below become relevant once the basics are running.

Multi-agent orchestration

Kanban (Chapter 4) is the orchestration layer: decompose a goal into specialist roles, run them in parallel, synthesize. The two constraints that matter β€” a dedicated --workspace dir: per task, and tasks small enough to finish in one run β€” are covered in Workflow D and are not repeated here.

Running multiple agents: profiles

Profiles allow multiple fully independent Hermes instances, each with its own config, memory, skills, sessions, and SOUL.md, sharing nothing by default. Each profile lives at ~/.hermes/profiles/<name>/.

hermes profile create designer --clone     # --clone copies the default profile's config and .env
hermes profile create programmer --clone
hermes profile create researcher --clone
hermes profile use <name>                  # set the sticky default
hermes -p <name> chat -q "..."             # one-off override
hermes profile list / show / rename / export / import

To run several agents on messaging platforms at once, give each profile its own bot β€” a platform allows only one connection per token, so a shared token breaks. Create one bot per profile and run the gateway wizard once per profile:

hermes -p designer gateway setup
hermes -p programmer gateway setup
hermes -p researcher gateway setup

The agents become genuinely distinct through their SOUL.md files β€” a designer profile written for hand-drawn technical illustration, a programmer profile written as a terse staff engineer, a researcher profile written to produce a daily digest. Edit each at ~/.hermes/profiles/<name>/SOUL.md.

Delegating execution to Claude Code

A programmer profile is more powerful if it does not write code directly but delegates execution to the Claude Code CLI: Hermes orchestrates and decides what is next, while Claude Code does the file edits, runs commands, and manages git. This also lets execution run on a Claude subscription rather than a separate API key.

Ensure the claude binary is on PATH (which claude should print a real path), then start a session with the programmer profile and send a single activation prompt instructing it to act as a staff engineer that uses Claude Code for all execution and to set itself up accordingly. The profile installs the claude-code skill on its own, verifies the binary, and from then on routes anything coding-related through Claude Code β€” choosing between Claude Code’s one-shot print mode and its interactive mode based on the task. The same delegation pattern works for other external CLIs.

Teaching a profile a style by example

The self-evolution loop can be used as a setup mechanism. Rather than hand-writing a skill, feed a profile reference examples β€” illustrations, newsletter intros, code-review comments β€” and ask it to study them and create a skill (via skill_manage) that reproduces the pattern, including any script the skill needs. The agent encodes the pattern itself and verifies the result. From then on, requests in that domain trigger the skill. This works for anything where consistency matters.

GEPA: optimizing skills offline

The in-agent learning loop (skill creation plus the Curator) has a known weakness: the agent tends toward self-congratulation β€” it usually believes it performed well, even when it did not β€” and the same mechanism that auto-generates skills can overwrite manual customizations with worse versions. The agent is, in effect, grading its own work.

GEPA addresses this. GEPA (Genetic-Pareto Prompt Evolution) is not part of the Hermes runtime. It lives in a companion repository, NousResearch/hermes-agent-self-evolution, is MIT-licensed, and is published as an ICLR 2026 Oral paper. It is an offline optimization pipeline: instead of asking the agent “did you do well?”, GEPA reads execution traces to understand why things failed, then proposes targeted improvements through reflective evolutionary search. It uses DSPy + GEPA, needs no GPU β€” everything runs through API calls β€” and costs roughly $2–10 per optimization run.

The pipeline:

  1. Read the current skill, prompt, or tool description from the Hermes repo.
  2. Generate an evaluation dataset β€” synthetic test cases, real session history from state.db, or a hand-curated golden set.
  3. Run the GEPA optimizer: read execution traces, diagnose failure points, generate candidate variants.
  4. Evaluate candidates with LLM-as-judge scoring against rubrics (graded, not binary pass/fail).
  5. Apply constraint gates: the full test suite must pass, skills must stay under a size limit, prompt-caching compatibility is preserved, and semantic purpose must not drift.
  6. The best valid variant goes out as a pull request against the Hermes repo β€” never a direct commit β€” for human review and merge.
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution
pip install -e ".[dev]"
export HERMES_AGENT_REPO=~/.hermes/hermes-agent
python -m evolution.skills.evolve_skill --skill <skill-name> --iterations 10 --eval-source synthetic

GEPA can be skipped initially. It earns its keep when you hit a wall with a skill and want trace-driven, test-gated improvement without the cost of fine-tuning or reinforcement learning. It is also still maturing β€” treat it as an advanced, somewhat experimental companion tool, and review every pull request it produces. In one line: the runtime loop captures experience, the Curator keeps the library clean, and GEPA verifies that what is in the library actually works.

Credential management

Running multiple agents across projects makes credential management a real concern. hermes auth provides credential pools that hold multiple keys per provider and rotate them automatically when one hits a rate limit or cooldown.

hermes auth                                   # interactive credential wizard
hermes auth list / status
hermes auth add openrouter --api-key sk-or-... # add an API key
hermes auth add anthropic --type oauth          # add an OAuth credential

For secrets that should not be stored in Hermes at all, the official 1password skill fetches credentials from 1Password at runtime (hermes skills install official/security/1password). Plain keys in .env are acceptable for a hobby setup; for production β€” multiple agents, key rotation, an audit trail β€” use hermes auth.

MCP server integrations

The Model Context Protocol connects Hermes to external systems β€” a database, GitHub, anything with an API.

hermes mcp serve                  # run Hermes itself as an MCP server
hermes mcp add github --command "npx @modelcontextprotocol/server-github"
hermes mcp add <name> --url https://remote-mcp-endpoint
hermes mcp list / test / configure / remove

--command runs a local MCP server process; --url connects to a remote endpoint. hermes mcp configure filters which of a server’s tools Hermes exposes. MCP servers are configured per-profile by design.

Extending Hermes: plugins and event hooks

Two mechanisms let you extend Hermes without modifying its core.

Plugins add custom tools, hooks, and integrations. There are three plugin types: general plugins (which contribute tools or hooks), memory providers (the external memory backends of Chapter 3’s Tier 3), and context engines (alternative context-management strategies). Plugins are managed through the interactive hermes plugins UI and live under ~/.hermes/plugins/.

Event hooks run custom code at lifecycle points. They come in two kinds. Gateway hooks fire around messaging activity and are the right place for logging, alerting, and outbound webhooks. Plugin hooks fire around the agent’s tool calls and are the right place for tool interception, metrics, and guardrails β€” for example, blocking or auditing a class of command before it runs. Hooks live under ~/.hermes/hooks/. Hooks plus webhooks are also how inbound automation is wired: a webhook can trigger a Hermes run, which is the basis for patterns such as an automated GitHub pull-request reviewer.

Provider routing and fallback

Beyond choosing one model, Hermes gives fine-grained control over which provider serves a request. Provider routing supports sorting, whitelists, blacklists, and priority ordering so requests can be optimized for cost, speed, or quality. Fallback providers add automatic failover: when the primary model errors or is rate-limited, Hermes fails over to a backup, with independent fallback for auxiliary tasks such as vision and context compression. Configure a chain with hermes fallback add so an unattended job does not stall on a single provider’s outage. (Prompt caching, discussed in Chapter 8, is a separate, always-on built-in: a cross-session one-hour prefix cache for Claude on the native Anthropic, OpenRouter, and Nous Portal providers.)

Using Hermes elsewhere: API server and IDE integration

Hermes is not confined to its own CLI and gateway.

API server. hermes can expose itself as an OpenAI-compatible HTTP endpoint, so any frontend that speaks the OpenAI format β€” Open WebUI, LobeChat, LibreChat, and others β€” can drive the full agent, tools and memory included. This is the cleanest way to put a custom or shared UI in front of Hermes.

IDE integration (ACP). Through the Agent Client Protocol, Hermes runs inside ACP-compatible editors including VS Code, Zed, and JetBrains IDEs. Chat, tool activity, file diffs, and terminal commands render inside the editor, which makes Hermes usable as a coding agent without leaving the development environment β€” complementary to the Claude Code delegation pattern above.

Voice mode and the web dashboard

/voice toggles real-time spoken interaction in the CLI, Telegram, and Discord, including Discord voice-channel mode. hermes dashboard launches a browser-based UI for managing configuration, keys, and sessions (requires pip install hermes-agent[web]); it binds to localhost by default, and the --insecure flag should be used only behind trusted network controls.

Migrating from OpenClaw

A setup can be migrated from OpenClaw rather than rebuilt. hermes claw migrate imports persona, memory, skills, providers, messaging tokens, and agent settings β€” over 30 categories. The setup wizard also detects ~/.openclaw on first run.

hermes claw migrate --dry-run                          # preview, write nothing
hermes claw migrate --preset full                      # all compatible settings, no secrets
hermes claw migrate --preset full --migrate-secrets    # include API keys

Secrets are migrated only with --migrate-secrets, and a restore-point snapshot is written before anything is applied.

Batch processing and research use

Hermes is built by a model-training lab and doubles as a research platform. Batch processing runs the agent across hundreds or thousands of prompts in parallel and emits structured, ShareGPT-format trajectory data β€” useful for generating training data or for large-scale evaluation. The same trajectory export feeds reinforcement-learning training via Nous Research’s Atropos framework. GEPA, above, is the prompt-and-skill-level counterpart that needs no weight training. Most operators will not use the RL path directly, but batch processing is a practical tool any time the same task must run over a large set of inputs, and the research lineage explains why the harness is engineered as carefully as it is.


Chapter 8: Operational Lessons

The points below are drawn from the official Tips and Best Practices documentation and from independent reviews of production deployments. None is in a feature list; each changes how effectively Hermes performs.

Prompt-cache economics

Most LLM providers cache the system-prompt prefix. When the system prompt stays stable across a session β€” same model, same context files, same memory β€” every message after the first benefits from a cache hit, substantially cheaper than a cold read. The corollary is the lesson: do not change the model mid-session, and do not churn context files, because either invalidates the cache. (This is also why Tier 1 memory is a frozen snapshot β€” a mid-session memory write would otherwise break the cache.) Switch models between sessions. /usage reports spend within a session; /insights gives a 30-day view.

Specify the goal, then delegate the steps

Two opposite failure modes occur with prompting. The vague prompt β€” “fix the code” β€” produces a vague fix and several rounds of clarification; front-load detail and paste tracebacks directly. The micromanaged prompt β€” dictating each step β€” wastes the agent’s actual strength; “find and fix the failing test” lets it search, run, and iterate. Be specific about the goal; let the agent determine the steps.

Skills are created but not always used

Hermes generates skills, but the agent decides when to load them. It may judge a skill unnecessary and skip it, or load it and use only part of it. A large collection of auto-generated skills is therefore not equivalent to a faster agent. Two habits address this: invoke skills that genuinely matter explicitly with /<skill-name> rather than relying on the agent to reach for them, and audit created skills periodically with hermes skills list and hermes curator run --dry-run. The compounding benefit is real β€” agents with a substantial set of self-created skills complete similar tasks markedly faster β€” but only when the skills are sound and actually used.

Self-improvement has no inherent ground truth

A self-improving agent improves toward whatever feedback signal it receives. In domains with clear feedback β€” code that compiles or fails, tests that pass or fail β€” the loop works. In ambiguous domains, or where the operator cannot judge correctness, there is no reliable ground truth, and the agent can become faster and more confident at the wrong thing. The agent also tends to rate its own performance generously. Defenses: review the skills the Curator creates and keeps; pin sound hand-authored skills (hermes curator pin) so they are not silently degraded; and, for skills that matter, prefer GEPA’s trace-driven, test-gated optimization (Chapter 7) over the agent’s self-assessment. Do not assume “it learned” means “it learned the correct thing.”

Choose a deliberate loop position

A useful frame distinguishes three positions: in the loop (each step is approved), on the loop (the operator supervises and intervenes), and out of the loop (the agent runs unattended). Hermes’s defaults place the operator on the loop for outputs and out of the loop for the learning, and the path of least resistance pulls toward fully out-of-the-loop. That is acceptable where feedback is crisp and a real risk where it is not. Decide deliberately which position each workflow warrants.

Security for an agent with shell access

An agent that runs shell commands unattended needs a deliberate security posture.

  • Keep dangerous-command approval enabled. Hermes checks every command against a curated list of dangerous patterns. When it prompts, four choices appear: once, session, always, deny. Choose always with caution β€” it permanently allowlists the pattern. Begin with session.
  • Container backends skip those checks. With Docker, Singularity, Modal, or Daytona, dangerous-command checks are disabled because the container is the security boundary β€” so the container image must itself be locked down.
  • Sandbox untrusted code. When working with an unfamiliar repository, set TERMINAL_BACKEND=docker so a harmful command cannot reach the host.
  • Never set GATEWAY_ALLOW_ALL_USERS=true on a bot with terminal access. Use per-platform allowlists (TELEGRAM_ALLOWED_USERS, DISCORD_ALLOWED_USERS) or DM pairing (hermes pairing approve).
  • Account for the skill and MCP supply chain. Auto-created skills, community skills, and MCP servers all execute with the agent’s privileges. Inspect skills before installing (hermes skills inspect), and do not point an unsandboxed Hermes instance at a payment or otherwise regulated codebase until its provenance, signing, and audit-trail story is mature.

The consensus from independent reviews is that Hermes is a strong always-on personal agent for individual developers, indie builders, and researchers, but is not yet suited to regulated backend engineering. Match the deployment to the stakes.

Choosing a model for the harness

Hermes is designed so a strong harness makes open or budget models perform at operator grade, and in practice this largely holds. The practical pattern: a frontier model (Claude Sonnet/Opus class, GPT class) for architecture and difficult multi-step reasoning, a fast inexpensive model (Claude Haiku, DeepSeek) for formatting and boilerplate. Switching is trivial, but the prompt-cache lesson applies β€” switch between sessions. Configure a fallback chain with hermes fallback add so a rate-limited primary does not stall an unattended job.

CLI reflexes worth building

Ctrl+C pressed once interrupts the agent so it can be redirected mid-thought. Ctrl+V pastes a clipboard image directly for vision analysis. Alt+Enter or Ctrl+J inserts a newline without sending. Typing / then Tab autocompletes every command and installed skill. /title on every session worth finding again prevents an indistinguishable pile of unnamed sessions.


Appendix: Quick Reference

Install and setup

# Install (Linux / macOS / WSL2 / Android-Termux)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Install (native Windows, PowerShell β€” early beta)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)

hermes setup            # configure (sections: model, terminal, gateway, tools, agent)
hermes doctor           # health check (add --fix to attempt repairs)
hermes status           # visual status overview
hermes dump             # plain-text setup summary for support requests
hermes update           # update (add --backup for a pre-update snapshot)

~/.hermes/ layout (key paths)

config.yaml             non-secret configuration
.env                    API keys and secrets
SOUL.md                 agent identity (system-prompt slot #1)
memories/MEMORY.md      Tier 1 memory β€” agent facts (~2,200 chars)
memories/USER.md        Tier 1 memory β€” user model (~1,375 chars)
skills/                 all skills; .archive/ holds Curator-archived skills
state.db                SQLite session store, FTS5 β€” Tier 2 memory / search
cron/jobs.json          scheduled jobs
profiles/<name>/        isolated profiles, each a full Hermes home
logs/                   agent.log, gateway.log, errors.log

Toolset and configuration rules

  1. Toolset changes take effect only in a new session (/new).
  2. Configuration changes require a restart (or /restart for the gateway).
  3. Enable only the toolsets a task needs.

Session commands

hermes chat -q "one-shot query"     # one-shot; shows tool output
hermes -z "scripted one-shot"       # final answer only β€” for scripts, cron, CI
hermes chat                         # interactive session (or just: hermes)
hermes --continue                   # resume the most recent session
hermes --resume <id-or-title>       # resume a specific session
hermes sessions list / browse / rename / prune / export / delete

Key in-session slash commands

/new  (alias /reset)     start a fresh session
/compress [focus]        compress context manually
/background <prompt>     run a prompt in a separate background session
/rollback [n]            list or restore filesystem checkpoints
/model [name]            switch among already-configured models
/skills                  search, install, and manage skills
/<skill-name>            load an installed skill (e.g. /python-testing)
/cron                    manage scheduled tasks (see cron forms below)
/sethome                 set the current chat as the home channel for deliveries
/title <name>            name the current session
/voice [on|off|status]   toggle voice mode
/usage                   token usage and cost for the session
/verbose                 cycle tool-output display modes
/help                    full command list

There is no /skill command. Load a skill with /<skill-name>; manage skills with /skills.

CLI keyboard shortcuts

Ctrl+C (once)        interrupt the agent β€” then type to redirect
Ctrl+C (twice/2s)    force exit
Alt+Enter / Ctrl+J   newline without sending (works in every terminal)
Ctrl+V               paste a clipboard image
/  then  Tab         autocomplete commands and installed skills

Context files and references

~/.hermes/SOUL.md    instance-wide identity (system-prompt slot #1)
AGENTS.md            project root β€” rules and conventions, auto-loaded each session
.hermes.md CLAUDE.md also recognized as project context files
.cursorrules         read automatically if present in the working directory
@<path|folder|url>   inject a file, folder, git diff, or URL into one message

Skill and Curator commands

hermes skills browse / search                  # explore registries
hermes skills install <id>                     # install a skill
hermes skills inspect <id>                     # preview without installing
hermes skills list / publish <path>
hermes skills tap add <user>/<repo>             # add a GitHub repo as a custom tap
hermes bundles create <name> --skill <id> ...   # group skills under one command
hermes curator run --dry-run                    # preview a Curator pass
hermes curator pin <skill>                      # protect a skill from archival

Cron

# In-session (/cron add):
/cron add 30m "..."                # one-shot in 30 minutes
/cron add "every 2h" "..."         # recurring interval
/cron add "0 9 * * 1-5" "..."      # standard cron expression
/cron add "every 1h" "..." --skill <name>   # attach a skill

# From a shell:
hermes cron list / create / edit / pause / resume / run / remove / status

Reliable messaging delivery (cron script pattern)

#!/bin/bash
TELEGRAM_BOT_TOKEN=$(grep TELEGRAM_BOT_TOKEN ~/.hermes/.env | cut -d= -f2)
TELEGRAM_CHAT_ID=$(grep TELEGRAM_CHAT_ID ~/.hermes/.env | cut -d= -f2)
RESULT=$(hermes -z "Your query here")
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
  -d "chat_id=${TELEGRAM_CHAT_ID}" -d "text=${RESULT}" -d "parse_mode=Markdown"

A home channel must also be set with /sethome for built-in delivery.

Gateway and profile commands

hermes gateway setup / run / install / start / stop / restart / status / list
hermes profile list / create <name> [--clone] / use <name> / show / rename
hermes -p <name> <command>            # run any command under a specific profile

Credentials, memory, MCP

hermes auth                           # credential pool wizard
hermes auth add <provider> --api-key <key> | --type oauth
hermes memory setup / status          # configure an external memory provider
hermes mcp serve / add / list / test / configure / remove

Extensibility and integration

hermes plugins                        # manage plugins (tools, memory providers, context engines)
hermes fallback add <provider>        # add a fallback provider for failover
# Event hooks live under ~/.hermes/hooks/ (gateway hooks and plugin hooks)
# API server: expose Hermes as an OpenAI-compatible HTTP endpoint
# IDE (ACP): use Hermes inside VS Code, Zed, or JetBrains editors

GEPA β€” offline skill optimization (companion repo)

git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution && pip install -e ".[dev]"
export HERMES_AGENT_REPO=~/.hermes/hermes-agent
python -m evolution.skills.evolve_skill --skill <skill-name> --iterations 10 --eval-source synthetic
# Output: a pull request against the hermes-agent repo. Review before merging.

Emergency recovery

Hermes will not start:        hermes doctor --fix ; check ~/.hermes/logs/
Tool unavailable after enable: /new
Config change has no effect:   exit and relaunch (or /restart the gateway)
Cron job not firing:           hermes cron status ; hermes cron list
Gateway not responding:        hermes logs gateway -f ; check the bot token ;
                               on WSL run `hermes gateway run` inside tmux
A file edit went wrong:        /rollback
Kanban scratch files gone:     unrecoverable β€” always use --workspace dir:/abs/path
A skill was degraded:          hermes curator pin <skill> ; restore from .archive/

Official resources

Documentation     hermes-agent.nousresearch.com/docs
Source            github.com/NousResearch/hermes-agent
Self-evolution    github.com/NousResearch/hermes-agent-self-evolution
Skills hub        agentskills.io  /  skills.sh
LLM-readable docs /docs/llms.txt  and  /docs/llms-full.txt

Built around Hermes Agent by Nous Research (MIT License). Verified against official documentation and source repositories. Command flags, defaults, and counts change between releases; confirm details against the current docs.