Pages

Tuesday, February 25, 2025

Creating AI Agents with Small Language Models

Metallic blue humanoid robot acting as a secret agent

It is an understatement to say that generative artificial intelligence applications have evolved rapidly since the launch of ChatGPT over two years ago. What started as applications that could answer questions via a text-based conversation has evolved into rich applications that can generate multimedia content, i.e., images, video and audio, by interpreting user requirements via natural language.

Now, the next evolution of these applications is called "AI agents," which mimic the human ability to reason and contemplate requests, gather external information and consider known facts before responding. These agents represent the next frontier by transforming static, single-step systems into dynamic, multi-step, autonomous agents capable of reasoning and gathering information external to themselves.

How AI agents reason and act

Agentic AI reasoning (thinking) process.  Credit: HuggingFace
The Agentic AI thought (reasoning) process. Credit: HuggingFace

AI agents' artificial reasoning process is described as the "Think, Act and Observe" cycle.

  1. Think - the AI agent invokes a language model to consider the user request in the context of what it has learned from the current conversation with the end user. It decomposes the user request based on its current knowledge and resources to decide how to act on that request.
  2. Act - After the language model responds to the agent on the results of its analysis, the agent acts on the model's analysis. The agent's action can simply be responding to the end user, invoking an external tool on the language model's advice, and putting the tool's response into the conversation context.
  3. Observe - The agent returns the current conversation context to the language model for consideration. Then, the language model can decide to go through another reasoning cycle if it chooses to do so. It can also determine whether it can meet the end user's request with the information it already has in the conversation. If this happens, the agent stops the reasoning cycle and responds to the end user.

This iterative process allows the AI to refine its understanding and provide more accurate and relevant responses, enhancing user satisfaction. By continuously observing and analyzing, the agent can adapt to various user needs effectively.

Creating an AI agent on a laptop

There are various ways to create an AI agent. You will likely use the environments provided by the large hyperscalar providers (Google, Amazon, and Microsoft) to build production agentic AI applications for enterprise-based solutions. These ecosystems have a rich set of functions to support meeting enterprise application requirements.

But what if you want to learn how to create agents and observe their artificial reasoning process without the pre-requisite of learning a particular hyperscalar ecosystem?

Recent small language models from Meta Llama 3.2 and Mistral v0.3 support agentic AI reasoning and tool invocation. These models are small enough to start with the Ollama inference server running locally on a laptop. In my case, I have a modest Windows 11 laptop running an i3 Intel CPU with 8 GB of RAM that was sufficient for me to write and test my AI agent with Meta's Llama 3.1 "1b" model.

Straightforward AI agent implementation with a small language model, using a command line interface

If you are interested in the code, use this link to access my GitHub repository.

What I Learned from Creating an AI Agent

AI Agents Running on Small Devices Can Produce Valuable Outputs

Depending on requirements, it is possible to deploy lightweight AI agents with small language models to understand end-user requests and then return responses in chat conversations using natural languages.

Of course, there are inherent performance and quality trade-offs between running small language models on local hardware versus large language models hosted by hyperscalars. Deploying AI agents that utilize large language models on higher capacity hardware can yield richer responses but incur higher costs toward the hyperscalars providing those models.

AI Agents Running on Local Machines Ensure Data Privacy

Another trade-off between running agents that use locally versus remotely hosted language models is the consideration of data privacy. As with non-agentic AI applications, using open-source language models running on local compute resources removes the risk of sending user information to hyperscalars' remote data centers, thus ensuring a higher level of privacy in the chat conversation between the end users and AI applications.

Prompt Engineering Is Important to Meet Requirements

The language models need guidance to understand what external tools are available to them when they should be invoked, what information should be supplied to the tools and what information it can receive back from them. The AI agent can send a specially crafted system prompt to the language model when the inference server starts it to provide that level of guidance.

As with non-agentic AI applications, crafting detailed, specific prompts to guide AI agents and their language models is critical for meeting end-user expectations.

Conclusion

With the right technical skills and modest hardware, you can create functional AI agents that can analyze complex queries, fetch external data, and synthesize responses in your native language. This democratization of AI empowers you to experiment without relying solely on hyperscalar infrastructures. As we push the boundaries of what AI can achieve, building local, agentic applications unlocks new avenues for creativity, ensures data privacy and achieves resource efficiency.


 About Chris Vitalos

I leverage decades of expertise in the wireless telecommunications industry to provide advisory services to ThreatSciences.com, a consultant agency providing cybersecurity services and leading security advisors.

Outside work, I enjoy hiking, writing, and spending time with my family.


Recent Posts