Semantic Memory with pgvector
In Lesson 7, you learned to verify high-stakes outputs using independent paths. Your budget tracker is now correct, deployed, and trustworthy for exact queries. Now the boss walks in with a different kind of question. She is about to approve an expense and asks: "Find all the past expenses that look similar to this one." She does not remember the exact words used. She does not have an expense ID. She just has a sense of the kind of thing she is looking for.
You write a WHERE description LIKE '%dinner%' query. It returns three rows. But you know there are more. "Client dinner," "team meal," "business lunch," and "offsite food" all describe the same kind of expense, yet share almost no keywords. Your exact-match memory cannot help here. This is the moment relational memory hits its ceiling and a second kind of memory takes over.
- Vector / embedding: A list of numbers that represents the meaning of a piece of text. Similar meanings produce similar numbers. The agent generates these for you; you never write them by hand.
- Semantic similarity: A score that says how close two pieces of text are in meaning, regardless of which exact words they use.
- Cosine distance: The math that compares two vectors by direction, not length. Smaller distance means closer in meaning.
- pgvector: A PostgreSQL extension that adds a vector data type and similarity operators to your existing database. It runs inside Neon, no separate service needed.
Two Kinds of Memory
Every agent in this book needs both kinds of memory. Neither replaces the other; they answer different questions.
| Question | Memory Type | Tool |
|---|---|---|
| "Find expense #47" | Exact-match | SQL WHERE id = 47 |
| "Show all expenses over $500 last month" | Exact-match | SQL WHERE and aggregations |
| "Find expenses similar to 'team lunch'" | Semantic | pgvector <-> operator |
| "What past tickets describe a problem like this one?" | Semantic | pgvector similarity ordering |
Exact-match memory is what you have built across Lessons 1 through 7. It excels at filters, joins, sums, and reports. Semantic memory is new. It answers questions about meaning when the user does not know the exact words. The boss does not say "find expense #47." She says "find expenses like this one." Two different questions, two different tools, both living in the same Neon database.
The point is coherence. Many teams reach for a separate vector database the moment they need semantic search. They install Pinecone or Weaviate, set up a second connection, sync data between systems, and double their operational surface area. pgvector eliminates that fork. Your relational data and your vector data live side by side in one database, queried through one connection.
PRIMM-AI+ Practice: Add Semantic Memory
Predict [AI-FREE]
Before you generate any embeddings, write down:
- Two expense descriptions from a real workplace that you think are semantically similar but share no keywords.
- Which of the five sample expenses should be the closest match to "team meal with the boss."
- What would count as a surprising or wrong ranking.
- Your confidence score from 1 to 5.
Do not ask the agent until the prediction is written.
Run
The first run adds the vector capability to the same Neon database.
What you tell the agent
The agent will enable the extension, alter the table, generate embeddings using a standard embeddings API, and insert five sample rows. You direct; the agent implements.
$ claude
> Enable pgvector extension on my Neon database. Add an `embedding vector(1536)` column to the expenses table. Generate embeddings for these 5 sample expenses using the embeddings API and insert them: "team lunch with client", "office supplies order", "software subscription renewal", "airport taxi to conference", "dinner with project stakeholders".
What you verify
Once the embeddings are inserted, ask the agent to run a similarity query. The query asks: given the phrase "team meal with the boss," which of the five rows are closest in meaning? The <-> operator measures cosine distance; smaller numbers mean closer meaning. You order by distance and take the top three.
Query: find the 3 expenses most similar in meaning to "team meal with the boss"
Result:
rank | description | distance
-----+------------------------------------------+----------
1 | team lunch with client | 0.18
2 | dinner with project stakeholders | 0.27
3 | airport taxi to conference | 0.61
Top match: "team lunch with client" (distance 0.18)
Status: verified, semantic neighbors returned in order
The first two results both describe shared meals at work, even though "team meal with the boss" shares no keywords with "dinner with project stakeholders." Exact-match SQL would return zero rows for this query. Semantic search returns the right answer ranked by closeness of meaning.
Compare the top three results to your prediction. If the top result surprises you, write down why.
Investigate
First, write your own explanation of why "team meal with the boss" matched "team lunch with client" even though the words differ. Then ask the agent why it used <-> (cosine distance) and not <=> (L2 distance). After that, ask what breaks if you compare embeddings generated by two different models. The answer to the second question is the most important thing to remember about embeddings in production.
Modify
Extend the query: find semantically similar expenses but only for the current user. The query needs both a user_id filter and vector ordering. Predict whether the top result should change after filtering. Then direct the agent to combine them and verify the ranked output.
Make [Mastery Gate]
Choose one free-text field from your own domain where exact matching misses meaning. Define the vector column, seed five realistic examples, write one semantic query, and predict the top result before running it. The gate passes when the top result matches your predicted semantic neighbor or you can explain why it did not.
For more than 10,000 rows, add an HNSW index. Tell the agent: CREATE INDEX ON expenses USING hnsw (embedding vector_cosine_ops). The agent handles the implementation; you decide when latency justifies adding it. Below 10,000 rows, the default sequential scan is fast enough and the index is not worth the storage cost.
Both runtimes send the same prompt text to the same model. The difference is the session wrapper: Claude Code uses > prefix in a claude session; OpenCode uses > prefix in an opencode session. The pgvector extension, embedding column, and verification query are identical, they live in the database, not the runtime.
Try With AI
Prompt 1: Add Semantic Memory to the Tracker
Enable pgvector on my Neon database and add an `embedding vector(1536)` column to the expenses table. Generate embeddings for these 5 expenses and insert them: "team lunch with client", "office supplies order", "software subscription renewal", "airport taxi to conference", "dinner with project stakeholders". Then run a similarity query that finds the top 3 expenses closest in meaning to "team meal with the boss". Show me the results ranked by cosine distance.
What you're learning: You are directing the full pgvector workflow end to end: extension, schema change, data ingestion, and verification. The output ranks rows by meaning, not keywords. This is your first hands-on proof that the same Neon database can serve both exact-match and semantic queries.
Prompt 2: Combine User Filter with Vector Order
Show me the top 3 expenses semantically similar to "client dinner" but only for user_id = 5. The query must filter by user_id first, then order by cosine distance. Explain why the order of operations matters for performance.
What you're learning: Real semantic queries are almost never pure vector searches. They combine relational filters (user, date range, category) with vector ordering. This is exactly why pgvector belongs in the same database as your relational tables: one query expresses both kinds of memory at once.
Prompt 3: Apply to Your Domain
I work in [your domain]. What data in my domain would benefit from semantic similarity search? Design the vector column and one retrieval query for my use case. Explain what exact-match would miss that semantic search would catch.
What you're learning: Semantic memory is a universal pattern. Whether your domain is legal contracts, medical notes, customer support tickets, or research papers, any free-text field where users describe things in different words benefits from semantic search. Identifying these fields in your own work is the most valuable skill from this lesson.
Checkpoint
- I can name two kinds of memory and give one example query for each.
- I directed the agent to enable pgvector and add an embedding column to the expenses table.
- I read the similarity output and confirmed the top result matched my predicted semantic neighbor.
- I can explain why pgvector belongs in the same Neon database as the relational tables, not a separate service.
- I can name one field in my own domain where semantic search would catch what exact-match misses.