Shared Memory for Agent Fleets: hermes-memory-pgvector
When you run more than one AI agent — a marketing minion, a trading minion, an incident-response minion — they each need memory. The built-in memory tool gives each agent its own. That is fine until you want them to share, or until you want to recall what the agent learned six weeks ago without paying for an LLM round-trip just to look it up.
hermes-memory-pgvector is a small Postgres + pgvector plugin that mirrors agent memory writes into a shared, embedded, queryable store. Published on PyPI at v0.3.0.
What it actually does
In one line: "a storage layer that gives the built-in memory model durable, multi-tenant, semantically-searchable backing, with no LLM in the hot path."
Two tables, both with HNSW vector indexes. memory_entries mirrors writes to the agent's MEMORY.md/USER.md files. conversations stores substantive chat turns (≥40 chars, boilerplate filtered out). Embeddings are 768-dim, computed by an external endpoint — Ollama, OpenAI-compatible, your choice. The agent never blocks on it: writes return in microseconds and the embedding worker handles the rest on a background queue.
Per-agent themes by default
Every request carries an X-Hermes-Session-Key header that scopes data by agent_identity. Marketing's notes do not pollute trading's recall. When you actually want cross-theme search, pass scope='all' explicitly. The default is the safe one.
Why standalone, not a fork
hermes-agent closed its built-in memory provider list per policy, so this lives as a separate /plugins directory scan instead of an upstream fork. Drop it next to the agent, set a few config keys, restart. Rollback is symmetric: disable the provider, optionally drop the tables. No long-lived state to migrate, no kernel patches to maintain.
What it is not
Not a Honcho replacement. Not a knowledge graph. Not a RAG framework. It is intentionally a thin layer that does one thing — turn the built-in memory tool into a shared, durable, vector-searchable store — and gets out of the way. No LLM deriver, no dialectic loop, no opinion about how you should chunk or rerank. Just vector math.
Get it
pip install hermes-memory-pgvector
Source, migration, and config on GitHub:
