🔎Building a Full-Stack Hybrid Search System (BM25 + Vectors + Cross-Encoders) with Docker

1️⃣ Introduction

Search is at the heart of every AI application. Whether you’re building a legal research assistant, a compliance monitoring tool, or an LLM-powered chatbot, the effectiveness of your system depends heavily on how well it can retrieve relevant information.

But here’s the problem:

If you rely only on keyword search (BM25), you’ll capture statutory phrases like “Section 420 IPC”, but miss paraphrases like “cheating law”.
If you rely only on vector search (embeddings), you’ll capture semantic meaning like “right to equality” → Article 14, but risk ignoring the exact legal terms that practitioners care about.

Neither approach is enough on its own. This is where Hybrid Search comes in — blending the precision of keywords with the flexibility of semantic vectors. And when we push it further with Cross-Encoder re-ranking, we get retrieval quality that feels much closer to human judgment.

👉 In this article, we’ll build a production-style hybrid search system for legal texts, packaged into a single Docker container. You’ll learn:

How hybrid search works (BM25 + vectors) and why it matters for AI
How to build and deploy a full-stack demo with FastAPI + a browser-based UI
How to measure retrieval quality with Precision, Recall, and NDCG
How to add Cross-Encoder re-ranking for significantly better top results
How to extend this system for real-world, large-scale AI applications

By the end, you’ll have a working legal search engine that you can run locally or deploy in production — and a clear understanding of how to balance precision, recall, and semantic coverage in retrieval systems.

Following diagram depicts the overall flow of the application.

2️⃣ Motivation: Why Hybrid Search for Legal Text?

Legal documents are tricky:

Keyword search (BM25) is precise for statutory phrases like “Section 420 IPC”, but brittle if a user types “cheating law.”
Vector search (Sentence Transformers) captures meaning (e.g., “right to equality” → Article 14), but sometimes misses terms of art.
Hybrid search combines them by weighting both signals, providing more reliable retrieval.
Cross-Encoders further refine results by deeply comparing the query with candidate passages, improving ranking precision.

This is especially important in legal AI, where accuracy, recall, and ranking quality directly impact trust.

3️⃣ Setting Up: Clone and Run in Docker

We packaged everything into one container.

git clone https://github.com/ranjankumar-gh/hybrid-legal-search.git
cd hybrid-legal-search
docker build -t hybrid-legal-search .
docker run --rm -p 8000:8000 hybrid-legal-search

Now open 👉 http://localhost:8000 to use the frontend.

Disclaimer: Dataset used is synthetically generated. Use with caution.

4️⃣ Frontend Features (Rich UI for Exploration)

The demo ships with a self-contained web frontend:

🔍 Search box + α-slider → adjust keyword vs. vector weight
🟨 Query term highlighting → shows where your query matched
📜 Search history → revisit previous queries
📑 Pagination → navigate through long result sets

This makes it easier to explore the effect of hybrid weighting without diving into code. Following is snapshot of UI:

5️⃣ Hybrid Search Implementation (BM25 + Vector Embeddings)

The search pipeline is simple but powerful:

BM25 Scoring → rank documents by keyword overlap
Vector Scoring → compute cosine similarity between embeddings
Weighted Fusion → final score = α * vector_score + (1 - α) * bm25_score

Example:

Query: “cheating law”
BM25 picks “Section 420 IPC: Cheating and dishonestly inducing delivery of property”
Vector model retrieves semantically similar text like “fraud cases”
Hybrid ensures 420 IPC ranks higher than irrelevant fraud references.

6️⃣ Cross-Encoder Re-ranking (Improved Precision)

Even with hybrid fusion, ranking errors remain:

Candidate: “Article 14: Equality before law”
Candidate: “Right to privacy case”

A Cross-Encoder re-scores query–document pairs using a transformer that attends jointly to both inputs.

👉 Model used: cross-encoder/ms-marco-MiniLM-L-6-v2

Process:

Hybrid search retrieves top-15 candidates
Cross-Encoder re-scores them
Final top-5 results are returned with much sharper precision

This extra step is computationally heavier but only applied to a small candidate set, making it practical.

7️⃣ Evaluation with Metrics

We measure Precision@k, Recall@k, NDCG@k on a small toy dataset of Indian legal texts.

Running evaluation inside Docker:

docker run --rm hybrid-legal-search python -c "from app.evaluate import HybridSearch, evaluate; e=HybridSearch(); evaluate(e, k=5)"

Sample Results

Method	Precision@5	Recall@5	NDCG@5
BM25 only	0.64	0.70	0.62
Vector only	0.58	0.82	0.68
Hybrid (no rerank)	0.72	0.83	0.79
Hybrid + Rerank ⚡	0.84	0.82	0.87

📊 Key Takeaway:

Hybrid fusion improves ranking balance
Cross-Encoder boosts Precision and NDCG significantly, crucial for legal AI

8️⃣ Deployment Considerations

Scaling: Replace the in-memory vector store with Qdrant, Weaviate, or Milvus for millions of docs
Performance: Cache Cross-Encoder results for frequent queries
Productionizing: Expose FastAPI endpoints and secure with API keys
Extensibility: Add re-ranking with larger LLMs (e.g., bge-reranker-large) for better results in enterprise deployments

9️⃣ References & Where to Go Next

🛠️ Libraries Used
📚 Further Reading
🔮 Next Steps
- Scale to real legal corpora (Indian Kanoon, US Case Law)
- Integrate LLM-based answer synthesis on top of retrieval
- Experiment with Reinforcement Learning from Human Feedback (RLHF) for domain-specific ranking