Making a talking bot using Llama3.2:1b running on Raspberry Pi 4 Model-B 4GB
Introduction
In this article, I describe my experiment on making a talking bot using a Large Language Model Llama3.2:1b and running it successfully on the Raspberry Pi 4 Model-B with 4GB RAM. Llama3.2:1b is the quantized version of the Llama model with 1 billion parameters for use on resource-constrained devices from Facebook. I have kept this bot primarily in question-answering mode to keep things simple. The bot is supposed to answer all the questions that llama3.2:1b can answer from its learned knowledge in the model. The objective is to run this completely offline without needing the Internet.
My Setup
The following picture describes my setup which consists of a Raspberry Pi to host the LLM (llama3.2:1b), a mic for asking questions, and a pair of speakers to play the answers from the bot. I have used the Internet while doing the installation etc. but the bot works in offline mode.
Following is the overall design explaining the end-to-end flow.
The user asks the question in the external microphone connected to the Raspberry Pi. This audio signal captured by the microphone is converted to text using a speech-to-text library. Text is sent to the Llama model running on the Raspberry Pi. The Llama model answers the question in the form of text that is sent to the text-to-speech library. The output of the text-to-speech is audio that is played and can be listened to by the user on the speaker.
Following are the steps of the setup:
- Install, run, and access Llama
- Installation and accessing Speech-to-text library
- Installation of text-to-speech library
- Record, Save, and Play audio
- Running the code (the complete code)
1. Install, run, and how to access llama using API
The Llama model is the core of this bot product. So before we move further, this should be installed and running. Please refer to the separate post on the topic, “Install, run, and access Llama using Ollama“. This post also describes the details of how to access the running model using the API.
2. Installation of speech-to-text library and how to use
I tried many speech-to-text libraries and finally satteled with “faster-whisper“. With the help of CTranslate2, a quick inference engine for Transformer models, faster-whisper is a reimplementation of OpenAI’s Whisper model. The performance of this library on the Raspberry Pi was also satisfactory. Works offline.
Installation: pip install faster-whisper
Save the following code in Pythone file say "speech-to-text.py"
and run python speech-to-text.py
Code Snippet:
from faster_whisper import WhisperModel
model_size = "small.en"
model = WhisperModel(model_size, device="cpu",
compute_type="int8")
# Transcribe
transcription = model.transcribe(
audio="basic_output1.wav",
language="en",
)
seg_text = ''
for segment in transcription[0]:
seg_text = segment.text
print(seg_text)
Sample input audio file:
Output text: “Please ask me something. I’m listening now”
3. Installation of text-to-speech library and how to use
The best offline text-to-speech library that works on resource-constrained devices is “pyttsx3“.
Installation: pip install pyttsx3
Save the following code in a Python file say "text-to-speech.py"
and run python text-to-speech.py
Code Snippet:
import pyttsx3
engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
text_to_speak = "I got your question. Please bear " \
"with me while I retrieve the answer."
engine.say(text_to_speak)
# Folloing is the optional line: If you want
# also to save audio file
engine.save_to_file(text_to_speak, 'speech.wav')
engine.runAndWait()
Sample input text: “I got your question. Please bear with me while I retrieve the answer.”
4. Record, Save, and Play audio
For recording, saving, and playing the audio, I have a separate post. Please refer “How to Record, Save and Play audio in Python?“.
5. The complete code
Following is the complete code. Save the following code in say “llama-bot.py
” and run python llama-bot.py
Code Snippet:
import requests
import pyaudio
import wave
import json
import pyttsx3
from subprocess import call
from faster_whisper import WhisperModel
# Load Model
model_size = "small.en"
model = WhisperModel(model_size, device="cpu",
compute_type="int8")
#p = pyaudio.PyAudio()
CHUNK = 512
FORMAT = pyaudio.paInt16 #paInt8
CHANNELS = 1
RATE = 44100 #sample rate
RECORD_SECONDS = 7
WAVE_OUTPUT_FILENAME = "pyaudio-output.wav"
engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
engine.say("I am an AI bot. You can ask me questions.")
engine.runAndWait()
while True:
engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
engine.say("I am listening now. Please ask.")
engine.runAndWait()
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK) #buffer
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data) # 2 bytes(16 bits) per channel
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
text_to_say = "I got your question. Please bear with me " \
+ "while I retrieve about the answer."
engine.say(text_to_say)
engine.runAndWait()
# Transcribe
transcription = model.transcribe(
audio=WAVE_OUTPUT_FILENAME,
language="en",
)
seg_text = ''
for segment in transcription[0]:
seg_text = segment.text
#print(seg_text)
# Call llama
data = '{}'
data = json.loads(data)
data["model"] = "llama3.2:1b"
data["stream"] = False
if seg_text == '':
seg_text = 'Tell about yourself and how you can help.'
data["prompt"] = seg_text + " Answer in one sentence."
r = requests.post('http://127.0.0.1:11434/api/generate',
json=data)
data1 = json.loads(json.dumps(r.json()))
# Print User and Bot Message
print(f'\nUser: {seg_text}')
bot_response = data1['response']
print(f'\nBot: {bot_response}')
# Text to Speech
engine = pyttsx3.init()
engine.setProperty('volume', 1)
engine.setProperty('rate', 130)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
engine.setProperty('voice', 'english+f3')
engine.say(bot_response)
engine.runAndWait()
Sample of the conversation with the bot: