Hybrid Reasoning Bot¶
Difficulty: Intermediate
Time: 60 minutes
Learning Focus: RAG, similarity thresholds, fallback logic
Module: RAG
Overview¶
Create a bot that intelligently chooses between using your indexed documents (RAG) or the model's general knowledge based on similarity scores. The bot tags responses with source indicators to clearly show where information is coming from.
Instructions¶
- Set up the project:
- Create a new Python script file
-
Import the necessary libraries from hands-on-ai - Define emoji constants for tagging sources (🧠 = from notes, 🌐 = fallback)
-
Build an index first:
-
Run
rag index path/to/your/docs/to create~/.hands-on-ai/index.npz -
Create the retrieval function:
-
Use
get_top_k(query, index_path, k, return_scores=True)to retrieve the top-K(chunk, source)pairs together with their similarity scores -
Build answer generation functions:
- Create a RAG-based answer function that uses retrieved chunks as context
-
Create a fallback function that uses only the LLM's general knowledge
-
Implement the decision logic:
- Use similarity threshold (e.g., 0.7) to decide between RAG or fallback
-
Tag responses appropriately with source indicators
-
Add command-line argument handling:
- Query input
- Index file path
- Similarity threshold
- Top-K value
-
Optional flags for comparing answers and showing scores
-
Test your implementation:
- Try queries with obvious matches in your index
- Try queries with no relevant information in your index
- Experiment with different threshold values
#!/usr/bin/env python3
"""
Hybrid Reasoning Bot
-------------------
A bot that chooses between RAG or general LLM answers based on similarity score.
It uses your indexed documents when it finds relevant information,
and falls back to general knowledge when necessary.
"""
import argparse
from hands_on_ai.chat import get_response
from hands_on_ai.rag.utils import get_top_k
# Emoji constants
RAG_TAG = "🧠" # Knowledge from your notes/documents
LLM_TAG = "🌐" # Knowledge from the model's training
def get_chunks_with_scores(query, index_path, top_k=3):
"""Retrieve the most relevant (chunk, source) pairs and their similarity scores."""
# get_top_k embeds the query, loads the index, and runs the search.
results, scores = get_top_k(query, index_path, k=top_k, return_scores=True)
return results, scores
def generate_rag_answer(query, chunks):
"""Generate an answer using RAG with the provided chunks as context"""
context = "\n\n---\n\n".join(chunks)
prompt = f"""Based on the following information, please answer the question.
If the information doesn't fully address the question, use only what's relevant.
Context:
{context}
Question: {query}
Answer:"""
return get_response(prompt)
def generate_llm_answer(query):
"""Generate an answer using only the LLM's general knowledge"""
prompt = f"""Please answer this question based on your general knowledge:
Question: {query}
Answer:"""
return get_response(prompt)
def main():
parser = argparse.ArgumentParser(description="Hybrid reasoning bot using RAG and general knowledge")
parser.add_argument("query", help="The question to ask")
parser.add_argument("--index", "-i", default="index.npz", help="Path to the .npz index file")
parser.add_argument("--threshold", "-t", type=float, default=0.7,
help="Similarity score threshold for using RAG (0.0-1.0)")
parser.add_argument("--top-k", "-k", type=int, default=3,
help="Number of chunks to use when generating RAG answers")
parser.add_argument("--no-fallback", "-n", action="store_true",
help="Disable fallback to general knowledge even with low scores")
parser.add_argument("--compare", "-c", action="store_true",
help="Show both RAG and general knowledge answers for comparison")
parser.add_argument("--show-scores", "-s", action="store_true",
help="Show similarity scores for retrieved chunks")
args = parser.parse_args()
# Retrieve relevant (chunk, source) pairs and their scores
print(f"Searching index: {args.index}...")
results, scores = get_chunks_with_scores(args.query, args.index, top_k=args.top_k)
top_chunks = [chunk for chunk, _ in results]
# Determine if we should use RAG based on the highest similarity score
best_score = max(scores) if scores else 0
use_rag = best_score >= args.threshold
# Show scores if requested
if args.show_scores:
print("\n=== Similarity Scores ===")
for i, ((chunk, source), score) in enumerate(zip(results, scores)):
print(f"[{i+1}] Score: {score:.4f} Source: {source}")
print(f"Preview: {chunk[:100]}...\n")
print(f"Best score: {best_score:.4f} (Threshold: {args.threshold})")
print(f"Decision: {'Using RAG' if use_rag else 'Using general knowledge'}\n")
# Generate answers
if args.compare:
# Generate both answers for comparison
print("\n=== Generating both answers for comparison ===")
print(f"\n{RAG_TAG} Generating answer from your documents...")
rag_answer = generate_rag_answer(args.query, top_chunks)
print(f"\n{LLM_TAG} Generating answer from general knowledge...")
llm_answer = generate_llm_answer(args.query)
# Display both answers
print(f"\n=== {RAG_TAG} Answer from your documents ===")
print(rag_answer)
print(f"\n=== {LLM_TAG} Answer from general knowledge ===")
print(llm_answer)
else:
# Generate only one answer based on the threshold
if use_rag or args.no_fallback:
tag = RAG_TAG
print(f"\n{tag} Generating answer from your documents...")
answer = generate_rag_answer(args.query, top_chunks)
else:
tag = LLM_TAG
print(f"\n{tag} No sufficiently relevant information found in your documents.")
print(f"{tag} Falling back to general knowledge...")
answer = generate_llm_answer(args.query)
# Display the answer
print(f"\n=== {tag} Answer ===")
print(answer)
# Show sources if using RAG
if use_rag or args.no_fallback or args.compare:
print("\n=== Sources ===")
for source in {source for _, source in results}:
print(f"- {source}")
if __name__ == "__main__":
main()
Extension Ideas¶
- Enhanced Source Attribution: Display specific document names and page numbers for RAG answers
- Confidence Visualization: Create a visual representation of similarity scores
- Hybrid Answers: Combine information from both sources when appropriate
- Interactive Mode: Create a chat-like interface that remembers context
- Custom Personalities: Allow different personality settings for different query types
- Performance Metrics: Track and display response times for both methods
- Web Interface: Create a simple web UI for the hybrid bot