Making a talking bot using Llama3.2:1b running on Raspberry Pi 4 Model-B 4GB

Introduction

In this article, I describe my experiment on making a talking bot using a Large Language Model Llama3.2:1b and running it successfully on the Raspberry Pi 4 Model-B with 4GB RAM. Llama3.2:1b is the quantized version of the Llama model with 1 billion parameters for use on resource-constrained devices from Facebook. I have kept this bot primarily in question-answering mode to keep things simple. The bot is supposed to answer all the questions that llama3.2:1b can answer from its learned knowledge in the model. The objective is to run this completely offline without needing the Internet.

My Setup

The following picture describes my setup which consists of a Raspberry Pi to host the LLM (llama3.2:1b), a mic for asking questions, and a pair of speakers to play the answers from the bot. I have used the Internet while doing the installation etc. but the bot works in offline mode.

Following is the overall design explaining the end-to-end flow.

The user asks the question in the external microphone connected to the Raspberry Pi. This audio signal captured by the microphone is converted to text using a speech-to-text library. Text is sent to the Llama model running on the Raspberry Pi. The Llama model answers the question in the form of text that is sent to the text-to-speech library. The output of the text-to-speech is audio that is played and can be listened to by the user on the speaker.

Following are the steps of the setup:

  1. Install, run, and access Llama
  2. Installation and accessing Speech-to-text library
  3. Installation of text-to-speech library
  4. Record, Save, and Play audio
  5. Running the code (the complete code)

1. Install, run, and how to access llama using API

The Llama model is the core of this bot product. So before we move further, this should be installed and running. Please refer to the separate post on the topic, “Install, run, and access Llama using Ollama“. This post also describes the details of how to access the running model using the API.

2. Installation of speech-to-text library and how to use

I tried many speech-to-text libraries and finally satteled with “faster-whisper“. With the help of CTranslate2, a quick inference engine for Transformer models, faster-whisper is a reimplementation of OpenAI’s Whisper model. The performance of this library on the Raspberry Pi was also satisfactory. Works offline.

Installation: pip install faster-whisper

Save the following code in Pythone file say "speech-to-text.py" and run python speech-to-text.py

Code Snippet:

from faster_whisper import WhisperModel

model_size = "small.en"
model = WhisperModel(model_size, device="cpu", 
                     compute_type="int8")

# Transcribe
transcription = model.transcribe(
    audio="basic_output1.wav",
    language="en",
)

seg_text = ''
for segment in transcription[0]:
    seg_text = segment.text

print(seg_text)

Sample input audio file:

Output text: “Please ask me something. I’m listening now”

3. Installation of text-to-speech library and how to use

The best offline text-to-speech library that works on resource-constrained devices is “pyttsx3“.

Installation: pip install pyttsx3

Save the following code in a Python file say "text-to-speech.py" and run python text-to-speech.py

Code Snippet:

import pyttsx3

engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
text_to_speak = "I got your question. Please bear " \
    "with me while I retrieve the answer."
engine.say(text_to_speak)
# Folloing is the optional line: If you want 
# also to save audio file
engine.save_to_file(text_to_speak, 'speech.wav') 
engine.runAndWait()

Sample input text: “I got your question. Please bear with me while I retrieve the answer.”

4. Record, Save, and Play audio

For recording, saving, and playing the audio, I have a separate post. Please refer “How to Record, Save and Play audio in Python?“.

5. The complete code

Following is the complete code. Save the following code in say “llama-bot.py” and run python llama-bot.py

Code Snippet:

import requests
import pyaudio
import wave
import json
import pyttsx3
from subprocess import call
from faster_whisper import WhisperModel
# Load Model
model_size = "small.en"
model = WhisperModel(model_size, device="cpu", 
                     compute_type="int8")

#p = pyaudio.PyAudio()

CHUNK = 512 
FORMAT = pyaudio.paInt16 #paInt8
CHANNELS = 1
RATE = 44100 #sample rate
RECORD_SECONDS = 7
WAVE_OUTPUT_FILENAME = "pyaudio-output.wav"

engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
engine.say("I am an AI bot. You can ask me questions.")
engine.runAndWait()

while True:
    engine = pyttsx3.init()
    engine.setProperty('volume', 1)
    engine.setProperty('rate', 130)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('voice', 'english+f3')
    engine.say("I am listening now. Please ask.")
    engine.runAndWait()

    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK) #buffer

    print("* recording")
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data) # 2 bytes(16 bits) per channel

    stream.stop_stream()
    stream.close()
    p.terminate()

    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
    
    engine = pyttsx3.init()
    engine.setProperty('volume', 1)
    engine.setProperty('rate', 130)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('voice', 'english+f3')
    text_to_say = "I got your question. Please bear with me " \
        + "while I retrieve about the answer."
    engine.say(text_to_say)
    engine.runAndWait()

    # Transcribe
    transcription = model.transcribe(
        audio=WAVE_OUTPUT_FILENAME,
        language="en",
    )
    seg_text = ''
    for segment in transcription[0]:
        seg_text = segment.text
    #print(seg_text)

    # Call llama
    data = '{}'
    data = json.loads(data)
    data["model"] = "llama3.2:1b"
    data["stream"] = False
    if seg_text == '':
        seg_text = 'Tell about yourself and how you can help.'
    data["prompt"] = seg_text + " Answer in one sentence."

    r = requests.post('http://127.0.0.1:11434/api/generate', 
                      json=data)
    data1 = json.loads(json.dumps(r.json()))

    # Print User and Bot Message
    print(f'\nUser: {seg_text}')
    bot_response = data1['response']
    print(f'\nBot: {bot_response}')
 
    # Text to Speech
    engine = pyttsx3.init()
    engine.setProperty('volume', 1)
    engine.setProperty('rate', 130)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('voice', 'english+f3')
    engine.say(bot_response)
    engine.runAndWait()

Sample of the conversation with the bot:

One comment on “Making a talking bot using Llama3.2:1b running on Raspberry Pi 4 Model-B 4GB

[…] I am making an audio bot that will answer questions from the chapters of the book “Democratic Politics” of class IX of the NCERT(India) curriculum. If you want to learn about making an audio bot, you can read my article on the topic “Making a talking bot using Llama3.2:1b running on Raspberry Pi 4 Model-B 4GB“. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top