Skip to main content

Qdrant Vector Store with LangChain

You have documents chunked and ready for indexing. You have an embedding model that converts text into 1536-dimensional vectors. Now you need somewhere to store those vectors and search them efficiently.

That somewhere is a vector database. And for production RAG systems, Qdrant has become the go-to choice for Python developers: it is open source, runs anywhere from Docker to cloud, and integrates seamlessly with LangChain.

By the end of this lesson, you will have Qdrant running locally, your documents indexed, and semantic search working. Your Task API from Chapter 40 is about to get much smarter.


Why Qdrant?

Before we deploy anything, you should understand why we chose Qdrant over alternatives like Pinecone, Weaviate, or Chroma:

FeatureQdrantWhy It Matters
Open sourceApache 2.0No vendor lock-in; inspect and modify
Docker-firstSingle commandRuns identically on laptop and production
Hybrid searchDense + sparse vectorsBest of semantic AND keyword search
LangChain nativelangchain-qdrant packageFirst-class integration, not a wrapper
FilteringPayload-basedCombine vector similarity with metadata

For your Task API, this means you can find tasks by meaning ("deployment-related work") while also filtering by status or priority—something pure keyword search cannot do.


Step 1: Deploy Qdrant with Docker

Open your terminal and run:

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Output:

...
[INFO] Qdrant gRPC listening on 0.0.0.0:6334
[INFO] Qdrant REST API listening on 0.0.0.0:6333
[INFO] Qdrant is ready to accept connections

Qdrant exposes two ports:

  • 6333: REST API (what LangChain uses by default)
  • 6334: gRPC API (higher performance for production)

Verify it is running by opening your browser to http://localhost:6333/dashboard. You should see the Qdrant web interface with an empty collections list.

Note: For production, add volume persistence: docker run -p 6333:6333 -v $(pwd)/qdrant_data:/qdrant/storage qdrant/qdrant. This ensures your vectors survive container restarts.


Step 2: Install Dependencies

You need three packages beyond what you installed in Lesson 3:

pip install langchain-qdrant qdrant-client fastembed
PackagePurpose
langchain-qdrantQdrantVectorStore integration
qdrant-clientLow-level Qdrant operations
fastembedFast sparse embeddings for hybrid search

Step 3: Initialize QdrantVectorStore

LangChain provides three initialization patterns depending on your use case.

Pattern A: In-Memory (Testing)

For unit tests and experimentation, you do not need Docker at all:

from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

# Create embeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# In-memory Qdrant client
client = QdrantClient(":memory:")

# Create collection with vector configuration
client.create_collection(
collection_name="task_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Initialize vector store
vector_store = QdrantVectorStore(
client=client,
collection_name="task_docs",
embedding=embeddings,
)

print(f"Vector store ready: {vector_store.collection_name}")

Output:

Vector store ready: task_docs

When to use: Tests, prototyping, learning. Data disappears when Python exits.

Pattern B: Docker/Server (Development & Production)

For persistent storage, connect to your running Qdrant instance:

from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Connect to Qdrant and create collection from documents
vector_store = QdrantVectorStore.from_documents(
documents=splits, # Your chunked documents from Lesson 3
embedding=embeddings,
url="http://localhost:6333",
collection_name="task_docs",
)

print(f"Indexed {len(splits)} documents")

Output:

Indexed 42 documents

When to use: Development on your machine, staging environments, production with Docker Compose or Kubernetes.

Pattern C: From Existing Collection

When your collection already exists (after a restart or in a shared environment):

from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Connect to existing collection (no documents needed)
vector_store = QdrantVectorStore.from_existing_collection(
embedding=embeddings,
collection_name="task_docs",
url="http://localhost:6333",
)

print(f"Connected to existing collection: {vector_store.collection_name}")

Output:

Connected to existing collection: task_docs

When to use: Application restarts, multiple services sharing one Qdrant instance, production deployments.


Step 4: Add Documents

Once your vector store is initialized, you can add documents in batches:

from langchain_core.documents import Document
from uuid import uuid4

# Sample task documents (in production, these come from your Task API database)
documents = [
Document(
page_content="Set up Docker containers for the FastAPI application. Include multi-stage builds for smaller images.",
metadata={"task_id": 1, "title": "Docker Setup", "priority": "high"},
),
Document(
page_content="Deploy the Task API to Kubernetes cluster. Configure horizontal pod autoscaling for traffic spikes.",
metadata={"task_id": 2, "title": "K8s Deployment", "priority": "high"},
),
Document(
page_content="Write unit tests for the task CRUD endpoints. Aim for 80% code coverage.",
metadata={"task_id": 3, "title": "Unit Tests", "priority": "medium"},
),
Document(
page_content="Implement user authentication using OAuth2 with JWT tokens. Support refresh token rotation.",
metadata={"task_id": 4, "title": "Auth System", "priority": "high"},
),
Document(
page_content="Set up CI/CD pipeline with GitHub Actions. Include linting, testing, and automatic deployment.",
metadata={"task_id": 5, "title": "CI/CD Pipeline", "priority": "medium"},
),
]

# Generate unique IDs for each document
uuids = [str(uuid4()) for _ in documents]

# Add to vector store
vector_store.add_documents(documents=documents, ids=uuids)

print(f"Added {len(documents)} documents to vector store")

Output:

Added 5 documents to vector store

Why UUIDs matter: Qdrant uses these IDs for updates and deletions. If you add a document with the same ID, it replaces the existing one—useful for keeping your vector store in sync with your database.


Now for the payoff. Search by meaning, not keywords:

# Semantic search - finds documents by meaning
query = "container orchestration"
results = vector_store.similarity_search(query, k=3)

print(f"Query: '{query}'")
print(f"Found {len(results)} relevant documents:\n")

for i, doc in enumerate(results, 1):
print(f"{i}. Task: {doc.metadata.get('title')}")
print(f" Priority: {doc.metadata.get('priority')}")
print(f" Content: {doc.page_content[:100]}...")
print()

Output:

Query: 'container orchestration'
Found 3 relevant documents:

1. Task: K8s Deployment
Priority: high
Content: Deploy the Task API to Kubernetes cluster. Configure horizontal pod autoscaling for traffic s...

2. Task: Docker Setup
Priority: high
Content: Set up Docker containers for the FastAPI application. Include multi-stage builds for smaller...

3. Task: CI/CD Pipeline
Priority: medium
Content: Set up CI/CD pipeline with GitHub Actions. Include linting, testing, and automatic deploymen...

Notice: the query "container orchestration" matched Kubernetes and Docker tasks even though neither document contains those exact words. That is semantic search in action.

Search with Scores

To understand how relevant each result is:

results_with_scores = vector_store.similarity_search_with_score(query, k=3)

print(f"Query: '{query}'\n")
for doc, score in results_with_scores:
print(f"Score: {score:.3f} | {doc.metadata.get('title')}")

Output:

Query: 'container orchestration'

Score: 0.847 | K8s Deployment
Score: 0.792 | Docker Setup
Score: 0.634 | CI/CD Pipeline

Higher scores mean stronger semantic similarity. You can use this to filter out weak matches:

# Only return results above a relevance threshold
MIN_SCORE = 0.7
strong_matches = [
(doc, score) for doc, score in results_with_scores
if score >= MIN_SCORE
]
print(f"Strong matches (score >= {MIN_SCORE}): {len(strong_matches)}")

Output:

Strong matches (score >= 0.7): 2

Step 6: Retrieval Modes

Qdrant supports three retrieval modes, each with different tradeoffs:

Dense Retrieval (Default)

What you have been using. Embeddings capture semantic meaning:

# This is the default mode
vector_store = QdrantVectorStore(
client=client,
collection_name="task_docs",
embedding=embeddings,
# retrieval_mode defaults to RetrievalMode.DENSE
)

Strengths: Finds semantically similar documents even with different words ("car" matches "automobile").

Weaknesses: May miss exact keyword matches; struggles with proper nouns and technical terms.

Sparse Retrieval (BM25)

Traditional keyword-based search using sparse vectors:

from langchain_qdrant import FastEmbedSparse, RetrievalMode

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Sparse-only retrieval
vector_store_sparse = QdrantVectorStore(
client=client,
collection_name="task_docs_sparse",
sparse_embedding=sparse_embeddings,
retrieval_mode=RetrievalMode.SPARSE,
)

Strengths: Precise keyword matching; great for proper nouns, error codes, technical identifiers.

Weaknesses: No semantic understanding ("deploy" does not match "deployment").

Hybrid Retrieval (Best of Both)

Combines dense semantic search with sparse keyword matching:

from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from qdrant_client import QdrantClient, models
from qdrant_client.http.models import Distance, SparseVectorParams, VectorParams

# Initialize both embedding types
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Create client and collection with both vector types
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="task_docs_hybrid",
vectors_config={
"dense": VectorParams(size=1536, distance=Distance.COSINE)
},
sparse_vectors_config={
"sparse": SparseVectorParams(
index=models.SparseIndexParams(on_disk=False)
)
},
)

# Initialize hybrid vector store
vector_store_hybrid = QdrantVectorStore(
client=client,
collection_name="task_docs_hybrid",
embedding=embeddings,
sparse_embedding=sparse_embeddings,
retrieval_mode=RetrievalMode.HYBRID,
vector_name="dense",
sparse_vector_name="sparse",
)

print("Hybrid vector store ready")

Output:

Hybrid vector store ready

When to use hybrid: Production RAG systems where users might search with exact terms ("OAuth2") or conceptual queries ("authentication system"). Hybrid mode handles both.


Retrieval Mode Decision Framework

Query TypeBest ModeExample
Conceptual questionsDense"How do I scale my application?"
Exact term lookupsSparse"ERROR_CODE_12345"
Mixed intent (production)HybridUser queries you cannot predict
Development/testingDenseSimplest to set up

For your Task API semantic search, start with dense retrieval. Move to hybrid when you observe users searching for exact task IDs or technical terms that semantic search misses.


Metadata Filtering

Vector similarity alone is not enough. You often need to filter by metadata (status, priority, assignee):

from qdrant_client import models

# Find high-priority tasks related to deployment
results = vector_store.similarity_search(
query="deployment automation",
k=5,
filter=models.Filter(
must=[
models.FieldCondition(
key="metadata.priority",
match=models.MatchValue(value="high"),
),
]
),
)

print("High-priority deployment tasks:")
for doc in results:
print(f" - {doc.metadata.get('title')}")

Output:

High-priority deployment tasks:
- K8s Deployment
- Docker Setup

The filter narrows results to high-priority tasks BEFORE vector similarity ranking. This is more efficient than filtering after retrieval.


Common Pitfalls

PitfallSymptomSolution
Wrong vector size"Vector size mismatch" errorMatch VectorParams(size=) to your embedding model (1536 for text-embedding-3-small)
Forgetting persistenceData lost on restartAdd -v volume mount to Docker command
Inconsistent IDsDuplicate documentsGenerate stable IDs from content hash or database primary key
Over-fetchingSlow searchesUse appropriate k value; 4-10 is typical for RAG

Safety Note

Your Qdrant instance stores indexed content. For production:

  • Do not expose port 6333 publicly without authentication
  • Use Qdrant Cloud or configure API keys for multi-user access
  • Never index sensitive data (PII, credentials) unless you have proper access controls

Reflect on Your Skill

You built a rag-deployment skill in Lesson 0. Does it know about Qdrant?

Test Your Skill

Open your skill and ask:

How do I set up Qdrant with LangChain for a production RAG system?

Check if it mentions:

  • Docker deployment command
  • The three initialization patterns (in-memory, Docker, existing collection)
  • Hybrid retrieval mode

Identify Gaps

  • Does your skill know about metadata filtering?
  • Does it explain when to use hybrid vs dense retrieval?
  • Does it warn about vector size mismatches?

Improve Your Skill

If gaps exist, update your skill with the patterns from this lesson:

Update my rag-deployment skill to include:
1. Qdrant Docker deployment (docker run -p 6333:6333 qdrant/qdrant)
2. Three QdrantVectorStore initialization patterns
3. Retrieval mode selection guidance (dense vs sparse vs hybrid)
4. Metadata filtering examples

Try With AI

Set up your AI companion (Claude Code, Cursor, or similar) and work through these challenges.

Prompt 1: Deploy and Verify

I've started Qdrant with `docker run -p 6333:6333 qdrant/qdrant`.
Write Python code that:
1. Connects to Qdrant at localhost:6333
2. Creates a collection called "test_collection" with 1536-dimension vectors
3. Adds 3 sample documents about Python programming
4. Runs a similarity search for "functions and methods"
5. Prints the results with relevance scores

Include error handling for connection failures.

What you are learning: The complete workflow from deployment to search. You will see how the pieces connect: client creation, collection setup, document indexing, and retrieval.

Prompt 2: Hybrid Search Comparison

Create a script that compares dense vs hybrid retrieval on the same
document set. Use these 5 documents about software development:
- "Implement REST API endpoints using FastAPI"
- "Write unit tests with pytest for API validation"
- "Configure Docker containers for microservices deployment"
- "Set up PostgreSQL database with SQLModel ORM"
- "Deploy to Kubernetes with helm charts"

Run both search modes on query "k8s deployment" and show me:
- Which documents each mode returns
- The relevance scores for each
- Which mode found the helm chart document (exact keyword match)

What you are learning: The practical difference between retrieval modes. You will observe that "k8s" (abbreviation) may not match semantically with "Kubernetes", but sparse/hybrid search can catch it.

Prompt 3: Production-Ready Setup

I need a production-ready Qdrant setup for my Task API. Help me create:
1. A Docker Compose file that persists Qdrant data
2. A Python module that handles connection, collection creation, and
graceful reconnection if Qdrant restarts
3. A function to sync tasks from my database to the vector store
(handle updates and deletes, not just inserts)
4. Appropriate error handling and logging

The Task model has: id, title, description, status, priority, created_at

What you are learning: Real-world deployment concerns: persistence, reconnection, data synchronization. This prompt bridges from tutorial code to production patterns.