Install, run, and access Llama using Ollama

What is Ollama?

Ollama is an open-source tool/framework that facilitates users in running large language models (LLMs) on their local computers, such as PCs, edge devices like Raspberry Pi, etc.

How to install it?

Downloads and installations are available for Mac, Linux, and Windows. Visit https://ollama.com/download for instructions.

Running Llama 3.2

Five versions of Llama 3.2 models are available: 1B, 3B, 11B, and 90B. ‘B’ indicates billions. For example, 1B means that the model has been trained on 1 billion parameters. 1B and 3B are text-only models, whereas 11B and 90B are multimodal (text and images).

Run 1B model: ollama run llama3.2:1b

Run 3B model: ollama run llama3.2

After running these models on the terminal, we can interact with the model using the terminal.

Access the deployed models using Web Browsers

Page Assist is an open-source browser extension that provides a sidebar and web UI for your local AI model. It allows you to interact with your model from any webpage.

Access the Llama model using HTTP API in Python Language

import json
import requests

data = '{}'
data = json.loads(data)
data["model"] = "llama3.2:1b"
data["stream"] = False
data["prompt"] = "What is Newton's law of motion?" + " Answer in short."

# Sent to Chatbot
r = requests.post('http://127.0.0.1:11434/api/generate', json=data)
response_data = json.loads(json.dumps(r.json()))

# Print User and Bot Message
print(f'\nUser: {data["prompt"]}')
bot_response = response_data['response']
print(f'\nBot: {bot_response}')