Making a talking bot using Llama3.2:1b running on Raspberry Pi 4 Model-B 4GB

Introduction

In this article, I describe my experiment on making a talking bot using a Large Language Model Llama3.2:1b and running it successfully on the Raspberry Pi 4 Model-B with 4GB RAM. Llama3.2:1b is the quantized version of the Llama model with 1 billion parameters for use on resource-constrained devices from Facebook. I have kept this bot primarily in question-answering mode to keep things simple. The bot is supposed to answer all the questions that llama3.2:1b can answer from its learned knowledge in the model. The objective is to run this completely offline without needing the Internet.

My Setup

The following picture describes my setup which consists of a Raspberry Pi to host the LLM (llama3.2:1b), a mic for asking questions, and a pair of speakers to play the answers from the bot. I have used the Internet while doing the installation etc. but the bot works in offline mode.

Following is the overall design explaining the end-to-end flow.

The user asks the question in the external microphone connected to the Raspberry Pi. This audio signal captured by the microphone is converted to text using a speech-to-text library. Text is sent to the Llama model running on the Raspberry Pi. The Llama model answers the question in the form of text that is sent to the text-to-speech library. The output of the text-to-speech is audio that is played and can be listened to by the user on the speaker.

Following are the steps of the setup:

  1. Install, run, and access Llama
  2. Installation and accessing Speech-to-text library
  3. Installation of text-to-speech library
  4. Record, Save, and Play audio
  5. Running the code (the complete code)

1. Install, run, and how to access llama using API

The Llama model is the core of this bot product. So before we move further, this should be installed and running. Please refer to the separate post on the topic, “Install, run, and access Llama using Ollama“. This post also describes the details of how to access the running model using the API.

2. Installation of speech-to-text library and how to use

I tried many speech-to-text libraries and finally satteled with “faster-whisper“. With the help of CTranslate2, a quick inference engine for Transformer models, faster-whisper is a reimplementation of OpenAI’s Whisper model. The performance of this library on the Raspberry Pi was also satisfactory. Works offline.

Installation: pip install faster-whisper

Save the following code in Pythone file say "speech-to-text.py" and run python speech-to-text.py

Code Snippet:

from faster_whisper import WhisperModel

model_size = "small.en"
model = WhisperModel(model_size, device="cpu", 
                     compute_type="int8")

# Transcribe
transcription = model.transcribe(
    audio="basic_output1.wav",
    language="en",
)

seg_text = ''
for segment in transcription[0]:
    seg_text = segment.text

print(seg_text)

Sample input audio file:

Output text: “Please ask me something. I’m listening now”

3. Installation of text-to-speech library and how to use

The best offline text-to-speech library that works on resource-constrained devices is “pyttsx3“.

Installation: pip install pyttsx3

Save the following code in a Python file say "text-to-speech.py" and run python text-to-speech.py

Code Snippet:

import pyttsx3

engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
text_to_speak = "I got your question. Please bear " \
    "with me while I retrieve the answer."
engine.say(text_to_speak)
# Folloing is the optional line: If you want 
# also to save audio file
engine.save_to_file(text_to_speak, 'speech.wav') 
engine.runAndWait()

Sample input text: “I got your question. Please bear with me while I retrieve the answer.”

4. Record, Save, and Play audio

For recording, saving, and playing the audio, I have a separate post. Please refer “How to Record, Save and Play audio in Python?“.

5. The complete code

Following is the complete code. Save the following code in say “llama-bot.py” and run python llama-bot.py

Code Snippet:

import requests
import pyaudio
import wave
import json
import pyttsx3
from subprocess import call
from faster_whisper import WhisperModel
# Load Model
model_size = "small.en"
model = WhisperModel(model_size, device="cpu", 
                     compute_type="int8")

#p = pyaudio.PyAudio()

CHUNK = 512 
FORMAT = pyaudio.paInt16 #paInt8
CHANNELS = 1
RATE = 44100 #sample rate
RECORD_SECONDS = 7
WAVE_OUTPUT_FILENAME = "pyaudio-output.wav"

engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
engine.say("I am an AI bot. You can ask me questions.")
engine.runAndWait()

while True:
    engine = pyttsx3.init()
    engine.setProperty('volume', 1)
    engine.setProperty('rate', 130)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('voice', 'english+f3')
    engine.say("I am listening now. Please ask.")
    engine.runAndWait()

    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK) #buffer

    print("* recording")
    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data) # 2 bytes(16 bits) per channel

    stream.stop_stream()
    stream.close()
    p.terminate()

    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
    
    engine = pyttsx3.init()
    engine.setProperty('volume', 1)
    engine.setProperty('rate', 130)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('voice', 'english+f3')
    text_to_say = "I got your question. Please bear with me " \
        + "while I retrieve about the answer."
    engine.say(text_to_say)
    engine.runAndWait()

    # Transcribe
    transcription = model.transcribe(
        audio=WAVE_OUTPUT_FILENAME,
        language="en",
    )
    seg_text = ''
    for segment in transcription[0]:
        seg_text = segment.text
    #print(seg_text)

    # Call llama
    data = '{}'
    data = json.loads(data)
    data["model"] = "llama3.2:1b"
    data["stream"] = False
    if seg_text == '':
        seg_text = 'Tell about yourself and how you can help.'
    data["prompt"] = seg_text + " Answer in one sentence."

    r = requests.post('http://127.0.0.1:11434/api/generate', 
                      json=data)
    data1 = json.loads(json.dumps(r.json()))

    # Print User and Bot Message
    print(f'\nUser: {seg_text}')
    bot_response = data1['response']
    print(f'\nBot: {bot_response}')
 
    # Text to Speech
    engine = pyttsx3.init()
    engine.setProperty('volume', 1)
    engine.setProperty('rate', 130)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    engine.setProperty('voice', 'english+f3')
    engine.say(bot_response)
    engine.runAndWait()

Sample of the conversation with the bot:

Install, run, and access Llama using Ollama

What is Ollama?

Ollama is an open-source tool/framework that facilitates users in running large language models (LLMs) on their local computers, such as PCs, edge devices like Raspberry Pi, etc.

How to install it?

Downloads and installations are available for Mac, Linux, and Windows. Visit https://ollama.com/download for instructions.

Running Llama 3.2

Five versions of Llama 3.2 models are available: 1B, 3B, 11B, and 90B. ‘B’ indicates billions. For example, 1B means that the model has been trained on 1 billion parameters. 1B and 3B are text-only models, whereas 11B and 90B are multimodal (text and images).

Run 1B model: ollama run llama3.2:1b

Run 3B model: ollama run llama3.2

After running these models on the terminal, we can interact with the model using the terminal.

Access the deployed models using Web Browsers

Page Assist is an open-source browser extension that provides a sidebar and web UI for your local AI model. It allows you to interact with your model from any webpage.

Access the Llama model using HTTP API in Python Language

import json
import requests

data = '{}'
data = json.loads(data)
data["model"] = "llama3.2:1b"
data["stream"] = False
data["prompt"] = "What is Newton's law of motion?" + " Answer in short."

# Sent to Chatbot
r = requests.post('http://127.0.0.1:11434/api/generate', json=data)
response_data = json.loads(json.dumps(r.json()))

# Print User and Bot Message
print(f'\nUser: {data["prompt"]}')
bot_response = response_data['response']
print(f'\nBot: {bot_response}')

How to Record, Save and Play audio in Python?

Libraries

The following are the required libraries.

  • PortAudio is a free, cross-platform, open-source, audio I/O library. It lets you write simple audio programs in ‘C’ or C++ that will compile and run on many platforms including Windows, Macintosh OS X, and Unix (OSS/ALSA).
  • PyAudio provides Python bindings for PortAudio. Following is the pip command.
    pip install pyaudio
  • wave module of Python3.
  • simpleaudio to play the saved wave audio file. Following is the pip command:
    pip install simpleaudio

Check mic check

Check if you have a working microphone on your system. Following is the code snippet you can use.

import pyaudio
import pprint 

p = pyaudio.PyAudio()
pp = pprint.PrettyPrinter(indent=4)

try:
    pp.pprint(p.get_default_input_device_info())
except:
    print("No mics availiable")

Example output:

Record the audio

Following is the code snippet to record the audio.

import pyaudio
import wave

p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK) #buffer

print("* recording")
frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data) # 2 bytes(16 bits) per channel

stream.stop_stream()
stream.close()
p.terminate()

pyaudio.PyAudio() method acquires system resources for PortAudio. pyaudio.PyAudio.open() sets up a pyaudio.PyAudio.Stream to play or record audio. pyaudio.PyAudio.Stream.read() is used to read audio data from the stream. In the above code, all the audio frames have been collected in the frames list frames []. These frames will be used for saving the audio file in the later part of the code.

Meaning of parameters to the function open:

  1. FORMAT: PortAudio provides samples in raw PCM (Pulse-Code Modulation) format. That means each sample is an amplitude to be given to the DAC (digital-to-analog converter) in your sound card. For paInt16, this is a value from -32768 to 32767. For paFloat32, this is a floating-point value from -1.0 to 1.0. The sound card converts this values to a proportional voltage that then drives your audio equipment. paFloat32, paInt32, paInt24, paInt16, paInt8, paUInt8, paCustomFormat
  2. CHANNELS means how many samples will be there for each frame.
  3. RATE is the sampling rate (the number of frames per second)
  4. CHUNK is the number of frames the signals are split into. This is arbitrarily chosen.

In the last, the stream should be closed and all the resources acquired must be released.

Saving the audio

Following is the code snippet to save the audio.

# Save the recorded audio file
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

wave module of Python3 is used for this purpose. Parameters are set as the same values that were used while recording the audio.

Play the audio

Following is the code snippet to play the audio. simpleaudio library I have used for this purpose. There are other libraries available that can be tried. simpleaudio I found to be simple enough.

# Play the recorded audio file
wave_obj = sa.WaveObject.from_wave_file("pyaudio-output.wav")
play_obj = wave_obj.play()
play_obj.wait_done()

Complete Code

import pyaudio
import wave
import simpleaudio as sa

CHUNK = 512 
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "myaudio.wav"

p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK) #buffer

print("* recording")
frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data) # 2 bytes(16 bits) per channel

stream.stop_stream()
stream.close()
p.terminate()

# Save the recorded audio file
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

# Play the recorded audio file
wave_obj = sa.WaveObject.from_wave_file("myaudio.wav")
play_obj = wave_obj.play()
play_obj.wait_done()

References