PocketPal AI — Bringing Small Language Models to Your Cellphone
Imagine having the power of an AI assistant in your pocket — one that doesn’t need the internet, respects your privacy, and works even when you’re off the grid. That’s exactly what PocketPal AI offers [https://github.com/a-ghorbani/pocketpal-ai]. This open-source app lets you run open source Small Language Models (SLMs) directly on your phone or tablet, with no data leaks, no subscriptions, and no fees (except a bit of hardware resources).
Let’s break down how the app works, the models it supports, and the performance of a specific model across various tasks.
What is PocketPal AI?
PocketPal AI brings Small Language Models straight to your phone. It is available on the Apple App Store.
Here’s why it’s awesome:
1. Works Offline: No internet? No problem. Enjoy total privacy and use it anywhere.
2. Model Variety: Download different models and switch between them depending on what you need.
3. Open Source: Transparent and customizable for developers.
4. Fast and Lightweight: Designed for smartphones, so it’s quick and doesn’t hog resources.
5. Completely Free: No subscriptions, no hidden fees.
Supported Models: Strengths and Weaknesses
PocketPal AI comes with a wide range of lightweight models to choose from (Figure 2.a). These models use the GGUF format, a highly optimized quantization method designed to make models smaller and more efficient for on-device usage. A curated library of models is available right in the app to get you started, each tailored to specific tasks with its own unique strengths (see table below).
If those options aren’t enough, you can expand your collection. The app lets you search for additional models directly on Hugging Face (Figures 2.b & 2.c) or load a model saved locally on your device, giving you complete flexibility to customize your setup.
That said, these models are smaller and won’t match the reasoning power of something like GPT-4o- series. But for everyday use, they get the job done!
A valuable feature is the ability to switch models within the same chat session, allowing you to start a conversation with one model and seamlessly continue it with another.
Inference speed and Memory usage
To evaluate the performance of different models, I used the same prompt and context length across multiple models to measure their token generation rates. The cellphone used is a Iphone 14 Pro Max. The results are summarized in the table below.
Phi-3.5-mini-4k-instruct and GPT-4o significantly outperform the other models in token generation speed, with rates of 119 and 110 tokens per second, respectively. In contrast, SmolLM2–1.7B-Instruct and Qwen2–1.5b-instruct are slower but still far exceed the reading speed of a typical human 5–7 tokens/sec (https://en.wikipedia.org/wiki/Speed_reading). Therefore, the variation in performance between the models should not be an inconvenience for users, as even the slower models can generate text faster than it can be read.
PocketPal AI includes a useful feature that displays memory usage alongside available memory, offering valuable transparency. The screenshot below shows the real-time memory usage statistics of Qwen2–1_5b-instruct model running on my cellphone. This feature is especially helpful as it allows users to monitor resource demands and avoid exceeding the device’s limits.
Settings
PocketPal AI provides advanced settings for users to tweak and optimize model performance and behavior (Figure 3).
The Context Size option lets you adjust the amount of input the model can process, which affects both its speed and memory usage. Meanwhile, the Layers on GPU slider allows you to manage how many layers are processed on your GPU, balancing performance with battery consumption.
When it comes to text generation, PocketPal AI offers extensive flexibility. You can fine-tune parameters like temperature, top_k, and top_p to control creativity and randomness in responses. Additional settings, such as repetition penalties, frequency penalties, presence penalties, and mirostat configurations, let you customize the output even further.
Model evaluation
Let’s take a look at how Qwen2–1.5b-instruct performs on the iPhone across various tasks:
1. Logical Reasoning capability
Prompt:
A train travels at 60 miles per hour for 2 hours, then slows down to
40 miles per hour for the next 1.5 hours due to track maintenance.
After that, it speeds up to 80 miles per hour for the final 30 minutes.
How far has the train traveled in total?
Explain your calculations step by step.
Response:
To calculate the total distance traveled by the train, we need to consider the distances traveled in eac hsegment of the journey:
1. First 2 hours at 60 miles per hour:
Distance = Speed x Time = 60 miles/hour x 2 hours = 120 miles
2. Next 1.5 hours at 40 miles per hour:
Distance = Speed x Time = 40 miles/hour x 1.5 hours = 60 miles
3. Last 0.5 hours at 80 miles per hour:
Distance = Speed x Time = 80 miles/hour x 0.5 hours = 40 miles
now , we add up all the distances:
Total distance = 120 miles + 60 miles + 40 miles = 220 miles
So, the train has traveled a total of 220 miles.
Evaluation: Solid logical reasoning, and correct answer.
2. n-Shot Prompting
Prompt:
Here are some reviews and their categories:
1. "The product is fantastic! It exceeded all my expectations." -> Positive
2. "This is the worst experience I've had with any company." -> Negative
3. "The product is okay, nothing special but does the job." -> Neutral
4. "I absolutely love it! Highly recommend to everyone." -> Positive 5.
5. "The item arrived late and damaged. Very disappointed." -> Negative
Now, categorize the following review:
"Not bad, but it could have been better."
Response:
Neutral
Evaluation: The model assigns the correct category, and response follows the format of the examples in the prompt.
3. Technical Knowledge
Prompt:
Explain the concept of 'muon tomography' and its use in archaeology.
Response:
Muon tomography is a medical imaging technique that uses muons,
which are subatomic particles that are not affected by the Earth's
magnetic field.
The muons are injected into the body of the patient through a needle
and then detected in a magnetic field. This allows for a detailed image
of the internal structure of the patient's body, including bones and
organs. In archeology, muon tomography can be used to detect the
presence of organic matter in soil or rock samples. This can be useful
for identifying the age and origin of artifacts or the presence of
ancient human remains. It can also be used to detect the presence of
minerals or other substances that may be useful for further analysis.
Overall, muon tomography is a powerful tool for archeologists and can
provide valuable information about the past.
Evaluation: Poor response. It contains multiple inaccuracies, and hallucinations. Among them: muon tomography is not a medical imaging technique, muons are not ‘injected’ into the body, the Earth’s magnetic field has little to do with muon behavior, muons do not detect organic matter…
4. Recent General Knowledge
Prompt:
Who won the 2023 FIFA Women’s World Cup, and which country hosted the
tournament?
Response:
The 2023 FIFA Women's World Cup was won by the United States.
The tournament was hosted by Canada.
Evaluation: The response is inaccurate and demonstrates hallucination. The correct answer is that Spain won the tournament, and it was co-hosted by Australia and New Zealand.
5. Coding Task
Prompt:
Write a Python function to check if a number is prime.
Response:
def is_prime(n):
""" Check if a number is prime. """
if n <= 1:
return False
if n <= 3:
return True
if n % 2 == 0 or n % 3 == 0:
return False
i = 5
while i * i <= n:
if n % i == 0 or n % (i + 2) == 0:
return False
i += 6
return True
def prime_numbers_in_list(input_list):
""" Return a list of prime numbers from the input list. """
prime_list = []
for num in input_list:
if is_prime(num):
prime_list.append(num)
return prime_list
Evaluation: the code is correct.
Conclusion: Is PocketPal AI Right for You?
PocketPal AI is a simple yet effective tool for running Language Model locally on your device, without needing Wi-Fi or compromising privacy. While it’s not built for highly complex tasks, it works well for everyday use and is free, open-source, and reliable.
If you need an offline AI companion on a camping trip or a long flight, it’s worth trying out.
Repo PocketPal AI: [https://github.com/a-ghorbani/pocketpal-ai]