1️⃣ Introduction
Search is at the heart of every AI application. Whether you’re building a legal research assistant, a compliance monitoring tool, or an LLM-powered chatbot, the effectiveness of your system depends heavily on how well it can retrieve relevant information.
But here’s the problem:
- If you rely only on keyword search (BM25), you’ll capture statutory phrases like “Section 420 IPC”, but miss paraphrases like “cheating law”.
- If you rely only on vector search (embeddings), you’ll capture semantic meaning like “right to equality” → Article 14, but risk ignoring the exact legal terms that practitioners care about.
Neither approach is enough on its own. This is where Hybrid Search comes in — blending the precision of keywords with the flexibility of semantic vectors. And when we push it further with Cross-Encoder re-ranking, we get retrieval quality that feels much closer to human judgment.
👉 In this article, we’ll build a production-style hybrid search system for legal texts, packaged into a single Docker container. You’ll learn:
- How hybrid search works (BM25 + vectors) and why it matters for AI
- How to build and deploy a full-stack demo with FastAPI + a browser-based UI
- How to measure retrieval quality with Precision, Recall, and NDCG
- How to add Cross-Encoder re-ranking for significantly better top results
- How to extend this system for real-world, large-scale AI applications
By the end, you’ll have a working legal search engine that you can run locally or deploy in production — and a clear understanding of how to balance precision, recall, and semantic coverage in retrieval systems.
Following diagram depicts the overall flow of the application.

2️⃣ Motivation: Why Hybrid Search for Legal Text?
Legal documents are tricky:
- Keyword search (BM25) is precise for statutory phrases like “Section 420 IPC”, but brittle if a user types “cheating law.”
- Vector search (Sentence Transformers) captures meaning (e.g., “right to equality” → Article 14), but sometimes misses terms of art.
- Hybrid search combines them by weighting both signals, providing more reliable retrieval.
- Cross-Encoders further refine results by deeply comparing the query with candidate passages, improving ranking precision.
This is especially important in legal AI, where accuracy, recall, and ranking quality directly impact trust.
3️⃣ Setting Up: Clone and Run in Docker
We packaged everything into one container.
git clone https://github.com/ranjankumar-gh/hybrid-legal-search.git
cd hybrid-legal-search
docker build -t hybrid-legal-search .
docker run --rm -p 8000:8000 hybrid-legal-search
Now open 👉 http://localhost:8000 to use the frontend.
Disclaimer: Dataset used is synthetically generated. Use with caution.
4️⃣ Frontend Features (Rich UI for Exploration)
The demo ships with a self-contained web frontend:
- 🔍 Search box + α-slider → adjust keyword vs. vector weight
- 🟨 Query term highlighting → shows where your query matched
- 📜 Search history → revisit previous queries
- 📑 Pagination → navigate through long result sets
This makes it easier to explore the effect of hybrid weighting without diving into code. Following is snapshot of UI:

5️⃣ Hybrid Search Implementation (BM25 + Vector Embeddings)
The search pipeline is simple but powerful:
- BM25 Scoring → rank documents by keyword overlap
- Vector Scoring → compute cosine similarity between embeddings
- Weighted Fusion → final score =
α * vector_score + (1 - α) * bm25_score
Example:
- Query: “cheating law”
- BM25 picks “Section 420 IPC: Cheating and dishonestly inducing delivery of property”
- Vector model retrieves semantically similar text like “fraud cases”
- Hybrid ensures 420 IPC ranks higher than irrelevant fraud references.
6️⃣ Cross-Encoder Re-ranking (Improved Precision)
Even with hybrid fusion, ranking errors remain:
- Candidate: “Article 14: Equality before law”
- Candidate: “Right to privacy case”
A Cross-Encoder re-scores query–document pairs using a transformer that attends jointly to both inputs.
👉 Model used: cross-encoder/ms-marco-MiniLM-L-6-v2
Process:
- Hybrid search retrieves top-15 candidates
- Cross-Encoder re-scores them
- Final top-5 results are returned with much sharper precision
This extra step is computationally heavier but only applied to a small candidate set, making it practical.
7️⃣ Evaluation with Metrics
We measure Precision@k, Recall@k, NDCG@k on a small toy dataset of Indian legal texts.
Running evaluation inside Docker:
docker run --rm hybrid-legal-search python -c "from app.evaluate import HybridSearch, evaluate; e=HybridSearch(); evaluate(e, k=5)"
Sample Results
Method | Precision@5 | Recall@5 | NDCG@5 |
---|---|---|---|
BM25 only | 0.64 | 0.70 | 0.62 |
Vector only | 0.58 | 0.82 | 0.68 |
Hybrid (no rerank) | 0.72 | 0.83 | 0.79 |
Hybrid + Rerank ⚡ | 0.84 | 0.82 | 0.87 |
📊 Key Takeaway:
- Hybrid fusion improves ranking balance
- Cross-Encoder boosts Precision and NDCG significantly, crucial for legal AI
8️⃣ Deployment Considerations
- Scaling: Replace the in-memory vector store with Qdrant, Weaviate, or Milvus for millions of docs
- Performance: Cache Cross-Encoder results for frequent queries
- Productionizing: Expose FastAPI endpoints and secure with API keys
- Extensibility: Add re-ranking with larger LLMs (e.g.,
bge-reranker-large
) for better results in enterprise deployments
9️⃣ References & Where to Go Next
- 🛠️ Libraries Used
- 📚 Further Reading
- 🔮 Next Steps
- Scale to real legal corpora (Indian Kanoon, US Case Law)
- Integrate LLM-based answer synthesis on top of retrieval
- Experiment with Reinforcement Learning from Human Feedback (RLHF) for domain-specific ranking