Question Answer Chatbot using RAG, Llama and Qdrant

1. Introduction

I have created this teaching chatbot that can answer questions from class IX, subject SST, on the topic “Democratic politics“. I have used RAG (Retrieval-Augmented Generation), Llama Model as LLM (Large Language Model), Qdrant as a vector database, Langchain, and Streamlit.

2. How to run the code?

Github repository link: https://github.com/ranjankumar-gh/teaching-bot/

Steps to run the code

git clone https://github.com/ranjankumar-gh/teaching-bot.git
cd teaching-bot
python -m venv env
Activate the environment from the env directory.
python -m pip install -r requirements.txt
Before running the following line, Qdrant should be running and available on localhost. If it’s running on a different machine, make appropriate URL changes to the code.
python data_ingestion.py
After running this, http://localhost:6333/dashboard#/collections should appear like figure 1.
Run the web application for the chatbot by running the following command. The web application is powered by Streamlit.
streamlit run app.py
The interface of the chatbot appears as in Figure 2.

Figure 1: Screenshot of the Qdrant dashboard after running the data_ingestion.py

Figure 2: Screenshot of the chatbot web application

3. Data Ingestion

Data: PDF files have been downloaded from the NCERT website for Class IX, subject SST, from the topic “Democratic politics”. These files are stored in the directory ix-sst-ncert-democratic-politics. The following are the steps for data ingestion:

PDF files are loaded from the directory.
Text contents are extracted from the PDF.
Text content is divided into chunks of text.
These chunks are transformed into vector embeddings.
These vector embeddings are stored in the Qdrant vector database.
This data is stored in Qdrant with the collection name “ix-sst-ncert-democratic-politics“.

The following is the code snippet for data_ingestion.py.

###############################################################
# Data ingestion pipeline 
# 1. Taking the input pdf file
# 2. Extracting the content
# 3. Divide into chunks
# 4. Use embeddings model to convet to the embedding vector
# 5. Store the embedding vectors to the qdrant (vector database)
################################################################
import os
from langchain_community.document_loaders import PDFMinerLoader
from langchain.text_splitter import CharacterTextSplitter
from qdrant_client import QdrantClient

path = "ix-sst-ncert-democratic-politics"
filenames = next(os.walk(path))[2]

for i, file_name in enumerate(filenames):
    print(f"Data ingestion for the chapter: {i}")

    # 1. Load the pdf document and extract text from it
    loader = PDFMinerLoader(path + "/" + file_name)
    pdf_content = loader.load()
    print(pdf_content)

    # 2. Split the text into small chunks
    CHUNK_SIZE = 1000 # chunk size not greater than 1000 chars
    CHUNK_OVERLAP = 30 # a bit of overlap is required for continued context

    text_splitter = CharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
    docs = text_splitter.split_documents(pdf_content)

    # Make a list of split docs
    documents = []
    for doc in docs:
        documents.append(doc.page_content)

    # 3. Create vectordatabase(qdrant) client 
    qdrant_client = QdrantClient(url="http://localhost:6333")

    # 4. Add document chunks in vectordb
    qdrant_client.add(
        collection_name="ix-sst-ncert-democratic-politics",
        documents=documents,
        #metadata=metadata,
        #ids=ids
    )

    # 5. Make a query from the vectordb(qdrant)
    search_results = qdrant_client.query(
        collection_name="ix-sst-ncert-democratic-politics",
        query_text="What is democracy?"
    )

    for search_result in search_results:
        print(search_result.document, search_result.score)

4. Chatbot Web Application

The web application is powered by Streamlit. Following are the steps:

A connection to the Qdrant vector database is created.
User questions are captured through the web interface.
The question text is transformed into a vector embedding.
This vector embedding is searched in the Qdrant vector database to find the most relevant content similar to the question.
The text returned by the Qdrant acts as the context for the LLM.
I have used Llama LLM. The query, along with context, is sent to the Llama for an answer to be generated.
The answer is displayed on the web interface as the answer from the bot.

Following is the code snippet for app.py.

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# React to user input
if query := st.chat_input("What is up?"):
    # Display user message in chat message container
    st.chat_message("user").markdown(query)
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": query})

    # Connect with vector db for getting the context
    search_results = qdrant_client.query(
    collection_name="ix-sst-ncert-democratic-politics",
    query_text=query
    )
    context = ""
    no_of_docs = 2
    count = 1
    for search_result in search_results:
        if search_result.score >= 0.8:
            #print(f"Retrieved document: {search_result.document}, Similarity score: {search_result.score}")
            context = context + search_result.document
        if count >= no_of_docs:
            break
        count = count + 1

    # Using LLM for forming the answer
    template = """Instruction: {instruction}
    Context: {context}
    Query: {query}
    """
    prompt = ChatPromptTemplate.from_template(template)

    model = OllamaLLM(model="llama3.2") # Using llama3.2 as llm model

    chain = prompt | model

    bot_response = chain.invoke({"instruction": "Answer the question based on the context below. If you cannot answer the question with the given context, answer with \"I don't know.\"", 
            "context": context,
            "query": query
            })

    print(f'\nBot: {bot_response}')

    #response = f"Echo: {prompt}"
    # Display assistant response in chat message container
    with st.chat_message("assistant"):
        st.markdown(bot_response)
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": bot_response})

1. Introduction

2. How to run the code?

3. Data Ingestion

4. Chatbot Web Application

Leave a Comment Cancel reply