$ memista

When you don't need a vector DB

2026-05-12 · architecture · rag · embeddable

Every few months a new vector database announces itself with a benchmark that peaks at a billion vectors and a sub-millisecond p99. The benchmarks are genuine. The implication — that you need that machine — usually isn’t.

Most retrieval workloads fit on one box. A surprising number fit in one process. This post is about how to tell, and about what removing the database actually changes.

What “embeddable” buys you

The first thing to notice about a vector database is that it is a database. That means: a network hop, a serialization format, a process to supervise, a config file, a backup strategy, a migration story, a security boundary, and a Helm chart someone has to keep current. None of those exist in your business; they exist to make the database operable from outside.

An embeddable index — a crate you link into your binary, or a library running as part of the same process — removes all of that. The vector you computed in RAM goes into an index that lives in RAM, on a thread you already had. The metadata goes into a SQLite file that sits next to your other state. There is no wire format because there is no wire.

This matters most in three shapes of workload:

  1. Desktop and CLI tools. A user runs your binary. There is no infrastructure team. There is a ~/.config/yourapp/ directory. Anything that requires a separate daemon is a non-starter; anything that requires Docker is hostile.
  2. Edge and on-device. A model and an index ship to a device with finite storage and no reliable network. The fewer moving parts, the fewer ways the device can be wrong.
  3. Agents and ephemeral workers. A short-lived process needs retrieval over a corpus it just built or fetched. Spinning up a database for thirty seconds of work is comedy.

memista is built for those shapes. So is usearch directly, so is hnsw_rs, so is instant-distance. The category exists.

Knowing your workload fits

The question is not really “does my data fit on one box” — it is “does my data fit in the working set of one process at the latency I need.”

A few rough heuristics, all of which you should measure rather than trust:

If those four roughly hold, you do not need a vector database. You need a library and a disk.

What you give up

Honesty section. Embeddable means:

The trade is: complexity at the boundary of your process for complexity inside it. For small and medium workloads, inside is cheaper.

Where memista fits on this map

memista is the explicit “library plus optional binary” shape. The crate exports AppState, create_app, and the request/response types; you can embed those in your own Actix app, or you can cargo install memista and run the server it ships with. The persistence is a SQLite file plus one <database_id>.usearch file per logical partition. That’s the unit of backup, the unit of move, the unit of delete.

A few honest notes, because the project is explicit about them in its own README: the current crate hardcodes embedding dimensions to two, which is a demo default; you will be editing IndexOptions::dimensions before real use. It has not been tested past about a hundred thousand vectors. It does not authenticate. These are not surprises if you read the source; they are also not things you’d accept from a database. From a library, they’re configuration you own.

When to migrate up

The signal is not size; it is contention. You’ll know it’s time when:

At that point the database earns its complexity. Until then, it doesn’t. Most of the projects that announce themselves with “we use a vector DB” are paying that cost for the same reason they are paying a Kubernetes cost: they did it the same way someone else did, before they measured whether they had to.

Measure first. If a single process and a 400 MiB index file can serve your traffic, ship that. You can always pull the cluster out of the closet later.

The operational dividend

There’s one more thing the database removes: the operational surface area that comes with it. Embeddable retrieval means you don’t need a separate monitoring story for the vector store. You don’t need a separate on-call rotation for it. You don’t need a separate access-control model. You don’t need to learn another query language. You don’t need to keep its client library version pinned in lockstep with your application.

For a small team — say, fewer than five engineers — these costs are not theoretical. Each new operational surface eats time that does not ship product. Removing the vector database is removing a paging target, a quarterly upgrade cycle, and a “who knows that system” bus factor.

The flip side: when something is broken with the retrieval inside your binary, the debugging surface is also smaller. Logs come from one process. State lives in two files you can ls. You can attach a profiler and see exactly where time is going. There is no protocol to sniff, no remote shell to open, no foreign log format to parse.

For shapes of work where the team is small, the data is bounded, and the recall budget is reasonable, this dividend is the actual reason to go embeddable. Speed is nice. Cost is nice. But the deeper win is that you have fewer systems, and the systems you have are ones you wrote.

← all posts