Ashpreet Bedi

Agent Engineering 101

hi@ashpreetbedi.com (Ashpreet Bedi) — Thu, 23 Oct 2025 00:00:00 GMT

✨ The intersection of software, systems and security engineering.

For a moment, stop debating what an Agent should be — deterministic or autonomous, a workflow or graph. Just pause for a sec and step back.

Our goal is to make use of this technology, which in my opinion, lends itself to 3 major use-cases:

Tools that improve productivity (chatgpt, claude, cursor).
Workflows that saves time (marketing research, report generation).
AI products that solve user problems (eg: Notion AI).

You can buy AI tools, and tools for building workflows, but building AI products is where the real engineering happens. Let's dive in.

What is Agent Engineering?

Agent Engineering is the practice of building, running and managing agentic systems. It sits at the intersection of software engineering, system design and security engineering.

In practice, if you're building an AI product, you'll need an AI backend — a system that connects to your frontend via an API. This backend is responsible for running agents (concurrently), managing memory, knowledge, state, and ensuring the security and privacy of your environment. This is Agent Engineering, which focuses on:

Runtime architecture: how agents are orchestrated, manage state, and handle execution loops.
Memory systems: how agents retain and manage context, session history, memory, knowledge and culture.
Tooling integration: how agents connect to APIs, databases, or internal functions (MCPs are popular here).
Safety & Security: how to ensure data, application and user-level security.
Evaluation & performance: measuring usefulness, latency, cost, and reliability of the agentic system.

Agent Engineers are responsible for answering questions like:

How do we serve our agents as an API that our frontend can call?
When should we use REST versus Websockets?
How do we handle request/response timeouts (29 seconds for aws api gateway)?
If tools are exposed via MCP, how should our AI backend establish and maintain a connection to the MCP server? Should it be initialized once using FastAPI lifecycle hooks, or re-established every time an agent runs (probably not)?
How should authentication and authorization be handled — once (probably not), per request, or through persistent sessions?
How do we manage concurrency and state when multiple users call the same agent? Are sessions properly isolated?
What is the security boundary of each request? Are agents only accessing data permitted by RBAC?
How do we log and monitor the agentic system? Tracing is popular, but it’s not enough. How do we capture events like “this request was made,” “this agent, via this request, accessed this data,” and the complete lifecycle of what happened during execution?

Agent Engineering is not just about building agents, it's about building the system that runs them (securely). Its 40% agent development, 40% system design and 20% security engineering.

How Agno helps with Agent Engineering?

Agno is a multi-agent framework, runtime, and control plane. It delivers a complete solution for building, deploying and managing multi-agent systems via 3 tightly coupled components:

Framework: for building Agents, Multi-Agent Teams and Workflows.
Pre-built FastAPI Runtime: for deploying multi-agent systems.
Control Plane: web interface for managing multi-agent systems.

One frustration I have with most frameworks is that they give you a way to build an agent, but almost no guidance on how to run it in production. Like, how do I serve this as an SSE compatible API that my frontend can call? How do I build a product out of this? This to me, is incomplete, because the real engineering happens after the agent is built. And no, logging (telemetry) and evals is not what makes a system production-grade. Since when did cloudwatch and unit-tests make a product? They're parts of it, sure, but stop selling them as the whole story.

While Agno gives you an incredibly feature-rich agent framework — it's the pre-built FastAPI application that really sets it apart. We call this the AgentOS. This is the real advantage of Agno, the advantage of working with people who've built these types of systems before.

A very simple example: along with the pre-build endpoints, the AgentOS initializes MCP connections in FastAPI lifecycle hooks, and gives you a security-key for authenticating every request.

Next, the control plane — our web interface for managing AgentOS — connects directly to your runtime via your browser, letting you test the real performance of your system. This architecture honestly only makes sense once you test it. So give it a try.

It's a novel architecture that makes your setup inherently secure, since your browser connects directly to the runtime, no data is sent to agno, or any external telemetry services or stored outside your cloud, you avoid unnecessary egress and retention costs.

Sending our AI app data to telemetry services is fundamentally broken. We don't send your app data, user data, or business data to a third-party logger — so why send our AI data? Why not just connect to the database directly to view it?

Minimal Example

Okay, let's demonstrate the power of Agno with a simple example. Here's a fully working Agent, with conversation history, access to tools via MCP, deployed as a FastAPI app - in 20 lines of code.

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude
from agno.os import AgentOS
from agno.tools.mcp import MCPTools

# ************* Create Agent *************
agno_agent = Agent(
    name="Agno Agent",
    model=Claude(id="claude-sonnet-4-5"),
    db=SqliteDb(db_file="agno.db"),
    tools=[MCPTools(url="https://docs.agno.com/mcp", transport="streamable-http")],
    add_history_to_context=True,
    markdown=True,
)

# ************* Create AgentOS *************
agent_os = AgentOS(agents=[agno_agent])
app = agent_os.get_app()

Run your AgentOS using fastapi dev agno_agent.py and chat with it on the AgentOS UI.

Your browser does not support the video tag.

Deploy your FastAPI app to your cloud of choice, and voilà, you're live in production. It's impossible to move this quickly without Agno.

Summary: The Layers of Agent Engineering

Agent Engineering has three fundamental layers:

The Framework (Build)

This is where you define your Agents, Teams and Workflows — the schemas, memory, knowledge, and guardrails, the reasoning loop.

The Runtime (Run)

The runtime serves (via API), scales, and orchestrates Agents in production. It handles concurrency, async execution, error recovery, and communication between agents and tools.

The Control Plane (Manage)

The control plane provides visibility: dashboards, monitoring, debugging, and human-in-the-loop control. It's how you understand what your agents are doing — and why.

Agno combines all three. It's not just a framework. It's a complete runtime and control plane for multi-agent systems.

Designed for Agent Engineering

I'll end this article with a list of features of Agno:

Category	Feature	Description
Core Intelligence	Model Agnostic	Works with any model provider so you can use your favorite LLMs.
	Type Safe	Enforce structured I/O through `input_schema` and `output_schema` for predictable, composable behavior.
	Dynamic Context Engineering	Inject variables, state, and retrieved data on the fly into context. Perfect for dependency-driven agents.
Memory, Knowledge, and Persistence	Persistent Storage	Give your Agents, Teams, and Workflows a database to persist session history, state, and messages.
	User Memory	Built-in memory system that allows Agents to recall user-specific context across sessions.
	Agentic RAG	Connect to 20+ vector stores (called Knowledge in Agno) with hybrid search + reranking out of the box.
	Culture (Collective Memory)	Shared knowledge that compounds across agents and time.
Execution & Control	Human-in-the-Loop	Native support for confirmations, manual overrides, and external tool execution.
	Guardrails	Built-in safeguards for validation, security, and prompt protection.
	Agent Lifecycle Hooks	Pre- and post-hooks to validate or transform inputs and outputs.
	MCP Integration	First-class support for the Model Context Protocol (MCP) to connect Agents with external systems.
	Toolkits	100+ built-in toolkits with thousands of tools, ready for use across data, code, web, and enterprise APIs.
Runtime & Evaluation	Runtime	Pre-built FastAPI based runtime with SSE compatible endpoints, ready for production on day 1.
	Control Plane (UI)	Integrated interface to visualize, monitor, and debug agent activity in real time.
	Natively Multimodal	Agents can process and generate text, images, audio, video, and files.
	Evals	Measure your Agents' Accuracy, Performance, and Reliability.
Security & Privacy	Private by Design	Runs entirely in your cloud. The UI connects directly to your AgentOS from your browser, no data is ever sent externally.
	Data Governance	Your data lives securely in your Agent database, no external data sharing or vendor lock-in.
	Access Control	Role-based access (RBAC) and per-agent permissions to protect sensitive contexts and tools.

Want to build with Agno?

Agno documentation: agno.link/docs
Signup for the AgentOS: os.agno.com
Star Agno on Github: agno.link/gh

Agent Security 101

hi@ashpreetbedi.com (Ashpreet Bedi) — Tue, 28 Oct 2025 00:00:00 GMT

PSA: If you're serious about Agent Security, stop sending your transactional data to telemetry services. Here's how to do it right:

Give your agents a database.
Store all transactions in that database.
Keep your data inside your system.
Avoid duplication across multiple systems.
Stop paying for egress and retention.

Transactional data ≠ Telemetry

Somewhere along the way, people started treating conversational traces as logs (they're not), and started pushing everything (agent inputs, outputs, reasoning, memory) to telemetry vendors. It's not just bad security hygiene, it's inefficient, redundant, and expensive.

Transactional data is what's happening in your system: inputs, outputs, tool calls, memory updates, and internal reasoning. It's the source of truth for your system and should never leave it.

Telemetry data is system metrics and operational metadata (latency, token usage, error rates, throughput, uptime). That's the stuff you aggregate and throw in cold storage after 180 days.

In an agentic system, conversational traces are transactional data. They belong inside your infrastructure:

They often contain PII, proprietary logic, and sensitive data and should never be sent externally.
They need to be re-used by your application (by future runs, for debugging and optimization), so you'll store them internally anyway.

So how do you do it properly?

1. Give your agents a database.

Agents need structured storage. Sessions, runs, memory, knowledge — all of it should persist in your database. Just like any other application.

I personally use Postgres + PgVector in production, and Sqlite for demos.

Here's a minimal example:

# /// script
# dependencies = [
#   "agno",
#   "anthropic",
#   "yfinance",
#   "sqlalchemy",
# ]
# ///

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude
from agno.tools.yfinance import YFinanceTools

# ************* Create Agent *************
agno_agent = Agent(
    name="Finance Agent",
    model=Claude(id="claude-sonnet-4-5"),
    db=SqliteDb(db_file="tmp/finance_agent.db"),
    tools=[YFinanceTools()],
    instructions="Use tables to display data.",
    add_history_to_context=True,
    add_datetime_to_context=True,
    num_history_runs=3,
    markdown=True,
)

# ************* Run Agent *************
agno_agent.print_response(input="What is the stock price of Apple?", stream=True, stream_intermediate_steps=True)
# Run #2 that continues the conversation
agno_agent.print_response(input="Can you write a report on it? Just give me the report, no other text.", stream=True, stream_intermediate_steps=True)

Save this to a file and run it with uv run finance_agent.py. You can see conversation history work flawlessly because it's stored in a local sqlite database.

Your browser does not support the video tag.

2. Store all transactions in that database.

When you run your agents, store all transactions in that database. Including: inputs, outputs, context, messages, tool calls, memory updates, knowledge updates, culture updates. Basically everything that happens in your agentic system should be stored in your database.

For enterprise workloads, this isn't just best practice, it's a requirement. You need to persist traces for compliance, auditing, debugging, and continuity.

Agno does this automatically for you.

External telemetry tools were never designed for this. They're built for metrics and logs, not for sensitive, replayable transactional data. You can make the case for running the data plane inside your VPC, you still have to deal with duplicated data (and pay enterprise data license costs).

3. Keep data within your system (and avoid duplication).

Every time you send LLM traces to an external service, you create redundant copies of sensitive data. This violates least-privilege principles and adds unnecessary complexity, you'll have to create "linking-ids" to connect your application usage to actual traces (solving problems that shouldn't exist in the first place).

Anyone who's built data pipelines knows: joining transactional data from app DBs with telemetry metrics is a nightmare. Skip the headache. Keep everything in one system.

4. Want a UI? No problem.

Once your data lives inside your infrastructure, it's easy to visualize. You could spin up a quick Streamlit dashboard or just use the AgentOS UI, which gives you a ready-to-use view of all your agent sessions, runs, memory, knowledge, etc.

Here's how:

# /// script
# dependencies = [
#   "agno",
#   "anthropic",
#   "yfinance",
#   "sqlalchemy",
#   "fastapi[standard]",
#   "mcp",
# ]
# ///

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude
from agno.os import AgentOS
from agno.tools.mcp import MCPTools

# ************* Create Agent *************
agno_agent = Agent(
    name="Agno Agent",
    model=Claude(id="claude-sonnet-4-5"),
    db=SqliteDb(db_file="tmp/agno.db"),
    tools=[MCPTools(transport="streamable-http", url="https://docs.agno.com/mcp")],
    add_history_to_context=True,
    add_datetime_to_context=True,
    num_history_runs=3,
    markdown=True,
)

# ************* Create AgentOS *************
agent_os = AgentOS(agents=[agno_agent])
app = agent_os.get_app()

# ************* Run AgentOS *************
if __name__ == "__main__":
    agent_os.serve(app="basic_demo:app", reload=True)

Run this file using uv run basic_agentos.py and connect to it on the AgentOS UI.

Your browser does not support the video tag.

5. Finally, stop paying for egress and retention.

Shipping full traces to third parties is expensive. Text is ok but when it comes to images, audio, video, files, etc., you're looking at a lot of bandwidth that is leaving your VPC. Egress fees, retention costs, and redundant storage add up — fast. Keeping data in your own infrastructure saves both money and risk.

Own your data, control your costs.

Why Agno?

Agno was designed from the ground up for building private, secure, high-performance, agentic systems.

Every Agent comes with its own database.
All data stays within your system.
Private. Secure. Open Source.

Agno documentation: agno.link/docs

Signup for the AgentOS: os.agno.com

Star Agno on Github: agno.link/gh

I know mentioning Agno here seems like a plug, it's not. The architecture is simple: you should own your data. You don't have to use Agno for that. You can build it yourself. The difference is that with most telemetry providers, your data stays locked with them forever. With Agno, it stays with you.

Agentic Culture

hi@ashpreetbedi.com (Ashpreet Bedi) — Tue, 21 Oct 2025 00:00:00 GMT

Andrej Karpathy shared on the Dwarkesh Podcast that LLMs don't have the equivalent of "culture".

So we built the scaffolding for them to develop one.

Why Culture?

Every Agent learns from its own interactions — the tasks it runs, the conversations it has, the errors it fixes. But that knowledge is siloed. It disappears when the session ends or the user changes.

Humans solved this problem a long time ago. We call it culture — the consolidation of shared knowledge that compounds over time.

With Agno, you can now give your Agents the same ability to learn collectively.

Introducing Agentic Culture

Agentic Culture is an open-source experiment in collective memory and in-context cultural for multi-agent systems.

It provides a shared cultural database where Agents can store and retrieve knowledge that persists beyond individual sessions, users, or memories. Culture becomes a living, evolving layer of context that shapes Agent reasoning and behavior over time.

Agents can now create, read, explore, and learn from their collective experience. See the Agentic Culture cookbook for example code.

“Culture is how intelligence compounds”

How It Works

Culture acts as a shared database where Agents can save reusable knowledge that benefits all interactions.

While Memory captures user-specific details (e.g. "Sarah prefers email"), Culture captures universal principles that benefit all interactions (e.g. "Always provide actionable next steps").

You can use Agno’s CultureManager to create and manage cultural knowledge entries. These are stored in your chosen database and automatically retrieved by your Agents for contextual grounding.

"""Demonstrates how to create and persist shared cultural knowledge with Agno's `CultureManager`."""

from agno.culture.manager import CultureManager
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude
from rich.pretty import pprint

# Step 1. Initialize the database
db = SqliteDb(db_file="tmp/demo.db")

# Step 2. Create the Culture Manager
culture_manager = CultureManager(
    db=db,
    model=Claude(id="claude-sonnet-4-5"),
)

# Step 3. Create cultural knowledge from a message
message = (
    "All technical guidance should follow the 'Operational Thinking' principle:\n"
    "1. **State the Objective** — What outcome are we trying to achieve and why.\n"
    "2. **Show the Procedure** — List clear, reproducible steps (commands/configs).\n"
    "3. **Surface Pitfalls** — What usually fails and how to detect it early.\n"
    "4. **Define Validation** — How to confirm it’s working (logs, tests, metrics).\n"
    "5. **Close the Loop** — Suggest next iterations or improvements."
)

culture_manager.create_cultural_knowledge(message=message)

# Step 4. Retrieve and inspect stored knowledge
pprint(culture_manager.get_all_knowledge())

Now give your agents access to the shared culture by setting add_culture_to_context=True. That's it. Your Agents now learn from shared cultural knowledge.

"""Use cultural knowledge with your Agents."""

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude

db = SqliteDb(db_file="tmp/demo.db")

agent = Agent(
    model=Claude(id="claude-sonnet-4-5"),
    db=db,
    add_culture_to_context=True,
    # optional: run culture manager after each run
    # update_cultural_knowledge=True,
)

agent.print_response(
    "How do I set up a FastAPI service using Docker?",
    stream=True,
    markdown=True,
)

What You Can Do With It

The current v0.1 release focuses on helping Agents stay consistent in tone, reasoning, and behavior. Over time, the goal is to transform isolated Agents into a living, evolving system of intelligence.

With Culture, you can:

Accumulate learnings and behavioral patterns from successful runs
Use that collective context to guide future decisions
Observe how "culture" evolves across teams, orgs, and domains

Examples

The Agentic Culture cookbook includes several runnable recipes:

File	Description
01_create_cultural_knowledge.py	Create cultural knowledge using a model.
02_use_cultural_knowledge_in_agent.py	Use cultural knowledge inside Agents.
03_automatic_cultural_management.py	Let Agents autonomously update culture over time.
04_manually_add_culture.py	Manually seed culture for tone guides or org-wide principles.
05_test_agent_with_cultural_knowledge.py	Freestyle testing — see culture in action.

Each builds on the previous one, so you can run them in sequence.

Agno is open-source, so you can contribute to the cookbook or build your own recipes. Here's the github repository: agno.link/gh

Future Work

This is early, but promising. We're exploring how to:

Integrate culture across multi-agent teams.
Sync or version cultural knowledge programmatically
Store culture in Postgres, Redis, or your own backend
Let Agents evolve shared norms collectively, like emergent civilizations

Karpathy describes a future where LLMs have a "giant scratchpad" — a shared space to think, write, and build on each other's ideas.

Agno is providing the scaffolding for developing that culture.

Explore & Build

Explore Agentic Culture: agno.link/agentic-culture
Agno on GitHub: agno.link/gh
Documentation: agno.link/docs
Agno Website: agno.com

Agentic Software Engineering

hi@ashpreetbedi.com (Ashpreet Bedi) — Sun, 01 Mar 2026 00:00:00 GMT

Note: this post is about building your own agents (agentic software engineering), not about using coding agents.

By now you've probably used a few agents, or at least heard of Claude Code, Codex, or OpenClaw. Ever wondered what it takes to build your own?

Most people think of agents as prompts + tools in a loop. That's a reasonable assumption, but it's not production architecture.

The moment your agent needs to know who it's talking to, maintain state, handle concurrent requests, take sensitive actions like refunds, and survive failing tool calls, it stops being an "LLM + tools in a loop" and becomes a distributed system.

Building agents is the easy part. There are 75 frameworks that help you do that. The hard part is the runtime: the harness around the agent that makes it work in the real world. That's what agentic software engineering is all about.

Build. Serve. Connect.

Here's how I think about shipping agentic software.

Build the agent. Define the model, tools, knowledge base, memory, storage, and guardrails. This is the layer that most frameworks give you.

Serve it as an API. User-scoped, session-scoped, horizontally scalable. Add persistent storage, streaming, background execution, retry semantics. This is where most agentic products stall. Not because the agent doesn't work, but because it doesn't have the infrastructure to work reliably at scale.

Connect it to where users live. Your product, Slack, Discord, MCP, wherever. An agent in a notebook is an experiment. An agent where your users are is a product.

The 6 Pillars of Agentic Software

Building an agent is AI engineering. Running it in production is software engineering. Together, they form agentic software engineering: the practice of building, running, and scaling agents as production services.

Here are the six pillars that hold it up:

Durability. Agents reason across multiple steps, call tools that time out, and fail halfway through. If your agent crashes on step 12 of 15, restarting might duplicate a side effect or lose critical context. Agentic software needs to pause, resume, checkpoint, and recover gracefully. Durability turns failure into resumption, not a full restart.

Isolation. Agentic software serves thousands of users simultaneously. Each user needs their own session, their own memory, their own context. Passing a user_id with each request is easy. Isolating every resource the agent touches is where the engineering comes in. Your database, your vector store, your model provider, all need to respect user boundaries. One missing filter becomes a data breach.

Governance. Agents that can act can also cause damage. Looking up a record is harmless. Deleting a record or issuing a refund needs approval. Agentic software needs layered authority: what runs automatically, what needs human approval, and what needs admin sign-off. Today, most agents auto-execute with minimal oversight. As they get more capable, governance becomes the product.

Persistence. An agent without persistent storage can't learn, can't build context, can't improve. We need to store sessions, memory, knowledge in a database. Persistent state is what turns a chatbot into a product. Every conversation makes the next one better.

Scale. A thousand users hit your agent at the same time. Requests queue, you hit model rate limits, and tool calls compete for resources. Traditional services call your own backends. Agentic software calls external model APIs and third-party tools, which means you inherit their rate limits, latency, and downtime. Scaling agentic software means scaling around dependencies you don't control.

Composability. When an agent is a service, other agents can call it. Your frontend can call it. Your Slack bot can call it. MCP clients can discover it. It becomes a building block in your architecture, and every new integration becomes a standard API call. That's how single-agent tools become multi-agent systems.

None of this is new. We've been building reliable distributed systems for decades. The AI industry just hasn't brought those lessons along yet, and we're feeling it in every failed deployment.

From Theory to Practice

As always, I come bearing code. Here's how you can start building your own agentic service today.

# 1. Clone the repo
git clone \
    https://github.com/agno-agi/agentos-docker-template.git \
    agentos

cd agentos

# 2. Set your model provider key
cp example.env .env
# Edit .env and add OPENAI_API_KEY

# 3. Start the application
docker compose up -d --build

# 4. Optional: Load documents for the knowledge agent
docker exec -it agentos-api python -m agents.knowledge_agent

This gives you a containerized service with persistent storage (Postgres), two starter agents (a knowledge agent using Agentic RAG and an MCP agent for external tool use), and a REST API you can connect to from anywhere.

I'm using Docker for this template because Docker runs everywhere: your laptop, AWS, GCP, Azure, Railway. The same container you develop locally is the one you deploy to production. The README covers everything you need to get started.

After running the service:

Open localhost:8000/docs to see your API.
Connect to the web UI at os.agno.com where you can chat with your agents, trace runs, manage knowledge, create schedules and approve sensitive tool calls. One UI for your agentic software.

Your browser does not support the video tag.

Adding your own agent is a few lines of Python and a restart. Swap models with a one-line change. Add tools from 100+ integrations. The template is a starting point. Read the Agno docs to learn more.

Governance & Elicitation

Most agents run tool calls with minimal oversight or auditability. In practice, we need layered authority:

Tools that run freely
Tools that need user approval
Tools that need admin approval

Agents also need to ask questions (often called elicitation). The Claude Code team shared a great article on the AskUserQuestion tool used by Claude.

This is available in Agno as UserFeedbackTools. Here's a support agent that can look up orders freely, ask the customer structured questions when it needs more information, and wait for admin approval before issuing a refund:

support = Agent(
    id="support",
    name="Support",
    model=OpenAIResponses(id="gpt-5.2"),
    db=agent_db,
    tools=[
        lookup_order,             # auto-execute
        search_help_docs,         # auto-execute
        issue_refund,             # requires user confirmation
        UserFeedbackTools(),      # structured questions
    ],
    instructions=instructions,
    enable_agentic_memory=True,
)

Watch what happens when a customer asks for a refund.

The agent looks up the order on its own, no permission needed.
Then it hits a decision point: why does the customer want the refund?
Instead of guessing, it presents a structured question with clear options: defective, wrong item, changed mind.
The customer picks one. Now the agent calls the refund tool, but because refunds carry real consequences, it pauses for user approval.
Once approved, the agent runs the refund tool.

Three levels of agency in one conversation. You can view the full code here.

Your browser does not support the video tag.

The agent knows when to act, when to ask, and when to wait. That's what governance looks like in practice. The runtime has to support all three modes, and the transitions between them have to feel natural.

Note: the approvals flow on the UI is actively being developed. The refund should wait for admin approval, not user approval. This is implemented on the SDK but not the UI yet. This is being fixed this week.

Agents are distributed systems

The 5 Levels describe how agentic software grows in capability (and complexity). The 7 Sins describe how they fail in production. The 6 Pillars describe what it takes to build them right.

The consistent message across all three: agentic software engineering is a discipline. The teams that internalize this early will ship great products. The teams that keep treating agents as scripts will continue to miss the mark.

Clone the repo. Build your first agent. Ship it where your users are.

Links:

Becoming AI-first

hi@ashpreetbedi.com (Ashpreet Bedi) — Sun, 19 Oct 2025 00:00:00 GMT

✨ Lessons from 100s of conversations on AI products and how teams are adopting AI.

Every tuesday and thursday, I take 3–5 calls with builders, CTOs, and CEOs of companies. One question on every CEO's mind is:

"How do I make my company AI-first?"

Common variations include: how can we use AI better? should we be building Agents? how do we add AI to our products?

Over time, I've identified patterns in how leading companies are approaching this question, and what separates the ones making real progress from those still in exploration mode.

What "AI-first" really means

Being AI-first doesn't mean using AI everywhere, or re-architecting you entire company or product around ChatGPT.

It means understanding where intelligence creates leverage for your team, your operations, and your product. If you can identify where AI genuinely moves the needle, you're already halfway there.

Broadly, I've found three high-leverage entry points:

Internal tools that improve productivity and decision-making.
Workflow automation that saves time and reduces operational load.
User-facing products that create revenue and differentiation.

Each represents a layer in your company's AI maturity. Let's dig in.

1. Internal Tools

These tools help your team save time, become more productive, and build intuition around AI. General-purpose agents (ChatGPT, Claude), coding assistants (Cursor, Claude Code), or vertical agents (legal, sales, marketing) all fit here. I'm yet to meet a team that's not all-in here.

These don't require a polished UX or commercial rollout — just curiosity and experimentation. The payoff is your team becoming AI-native faster than your competitors.

Most teams I speak to give everyone access to a multitude of AI tools. The cost is trivial compared to the learning dividend.

If you're not doing this already, get your team a ChatGPT subscription and cursor/CC for coding. Connect these tools to your company knowledge, databases, and documents. Let your team explore, learn, and build intuition.

2. Workflow Automation

Once your team sees what's possible, you'll start spotting repeatable patterns ripe for automation. This is where AI turns mundane tasks into automated processes that can run in the background.

Examples: invoice classification, market research, sales prep, support summarization, or daily reporting.

That said, the highest-ROI workflows are almost always specific to your team. They take effort to design — and while "no-code" tools like N8N or Zapier can help, most serious setups eventually involve code. Frameworks like Agno can help here if you have engineering resources.

Treat automation as part of your system design, not a side project. Its ok to invest in it, if only to learn and build intuition.

3. User-Facing AI Products

This is where AI creates compounding value — by improving the product your users already love. You can:

Buy off-the-shelf products that add AI-powered features to existing products (e.g., a support agent). I highly recommend this as a starting point, its easy to get started and you start seeing immediate value.
Build new AI features specific to your product. The goal here is to make your product smarter, faster, and more delightful.

Your goal here isn't to "add AI" — it's to make the experience better. The best AI features often don't look like AI at all.

Our most successful case studies are ones where users don't even realize AI is at work, they just notice things getting smarter, forms getting filled automatically, and buttons that automate what was previously a 10-step manual process.

So general recommendation is to start with off-the-shelf products that add AI-powered features to your product. But once you need to build AI-features that are specific to your product, here's how to do it.

Add small, reliable AI features - ideally as "magic buttons" or "magic interactions". Reliability is the keyword here.
Automate targeted, well-defined problems - solve one painful step at a time. Serve the AI application as a RestAPI, which your product can call when the user clicks the "magic button".
Avoid generic chatbots - they shift the cognitive load to the user and expose an incredibly vast surface area, which is bound to disappoint. Instead, build clear, purposeful interfaces that do the work for them. This will also force you to think about the user experience and how to make it more intuitive and delightful.

Each of these "magic moments" compounds. Over time, your product becomes AI-first not by branding, but by behavior.

Start simple, focus on clarity and reliability over complexity.

From exploration to execution

If you want to accelerate this journey, Agno is a starting point.

It will give you the right primitives for building AI features and a FastAPI application that you can deploy in your cloud (for privacy and security). Your product can easily integrate with this API and before you know it, you'll be serving AI features to your users.

Want to build with Agno?

Agno documentation: agno.link/docs
Signup for the AgentOS: os.agno.com
Star Agno on Github: agno.link/gh

Dash: The Data Agent Every Company Needs

hi@ashpreetbedi.com (Ashpreet Bedi) — Wed, 15 Apr 2026 00:00:00 GMT

Every company with 30+ people should have an internal data agent and today I'm making ours open-source: take Dash, run it in your cloud, and give your team access via Slack.

Most AI-forward companies have in-house data agents:

OpenAI: Inside OpenAI's in-house data agent
Vercel: d0, another post
Uber: QueryGPT (creative name)
LinkedIn: SQLBot (absolutely LinkedIn-coded name for the agent)
Salesforce: Horizon Agent
DoorDash: How to use every buzzword in a blog post

This post will show you how to build a best-in-class data system and make it available to your team over Slack. If you do this well, Dash should handle roughly 80% of routine data questions, send daily reports, and catch metric anomalies before anyone asks.

What is Dash?

Dash is a self-learning data system made of 3 agents: Dash (the team leader), a Data Analyst and a Data Engineer.

It uses a dual-tier knowledge and learning system to deliver an incredible work-with-your-data experience.

You can chat with it via Slack or the AgentOS UI.

It writes SQL, runs it, and tells you what the numbers mean. More importantly, when it makes a mistake or gets corrected, it learns from it. When your team keeps asking the same question, it builds infrastructure so the answer is faster next time.

A self-learning data system, not a data agent.

Dash uses its own PostgreSQL database. You don't point it at your production database. You progressively load the tables you want it to work with, along with the context it needs to be useful. This is the part most people skip. This is the part that makes it special.

Here's how it looks in Slack (8x speedup when waiting):

Your browser does not support the video tag.

And on the AgentOS UI:

Your browser does not support the video tag.

Using the AgentOS UI, you can chat with your agents, view sessions, traces, metrics, and schedules.

AgentOS is the agent platform you didn't know you needed.

How It Works

1. Context is everything

Most data agents get a schema dump and the impossible task of writing SQL from business logic that only lives in the data engineer's head. That's why they're bad. Column names and types tell you nothing about the data. They don't tell you that ended_at IS NULL means a subscription is active. That annual billing gets a 10% discount. That usage metrics are sampled 3-5 days per month, so summing them gives you garbage.

I wrote about this problem in detail in my Self-Improving Text-to-SQL Agent post. The core insight holds: the biggest improvement you can make to your data agent is giving it the same tribal knowledge that human engineers have.

Dash uses a carefully curated knowledge system backed by PgVector. It contains:

Table metadata. Table schema, column types, what they mean, what to use each table for, the gotchas. Every table ships with use cases and data quality notes. Example: status is 'active', 'churned', or 'trial'; always check against subscriptions for ground truth.

Validated queries (must have). Battle-tested SQL with the right JOINs, the right NULL handling, the right edge cases. When the Analyst gets your question, it searches knowledge first. Before it writes a line of SQL, it already knows the shape of the data and which traps to avoid.

Business rules. How MRR is calculated, what NRR means, that a customer can have multiple subscription records because upgrades close the old row and open a new one. This is the context that separates a correct answer from a plausible-looking wrong one.

This knowledge is curated by the user. What makes Dash special is its ability to learn on its own.

2. Self-learning loop

Separate from knowledge, Dash captures what it learns automatically (via tool calls). When the Analyst hits a type error and fixes it, the fix gets saved. When a user corrects a result, that correction is recorded. When the system discovers a data quirk, it notes it.

Next time anyone asks a similar question, the Analyst checks learnings before writing SQL. Dash gets better the more it's used.

I've been developing this pattern since December 2025, first as GPU Poor Continuous Learning and then refined through Dash v1. The approach is simple: the model stays frozen. The system gets smarter. Learning happens in retrieval, not in weights. It's auditable, reversible, and requires zero training compute.

3. Three agents, two schemas

Dash is three agents. Leader routes requests and synthesizes answers. Analyst writes and runs SQL. Engineer builds views, summary tables, and computed data. They work together, sharing knowledge and learnings.

The Leader has no SQL tools. It cannot touch the database.

The Analyst is read-only. Not "read-only because the prompt says so." Read-only because the PostgreSQL connection is configured with default_transaction_read_only=on. The database itself rejects writes. No prompt injection or clever jailbreak changes this. The database says no.

The Engineer can write, but only to the dash schema. A SQLAlchemy event listener intercepts every SQL statement before execution and blocks anything targeting the public schema. Your company data is untouchable.

This gives you two schemas with a hard boundary:

public schema: your company data. You load it. Agents read it.
dash schema: views, summary tables, computed data. The Engineer owns and maintains it.

There's also an ai schema where Dash stores its sessions, learnings, knowledge vectors, and other operational data. It powers the AgentOS UI and the self-improvement loop.

I covered the security model in depth in my Systems Engineering post. The key principle: security is a system property enforced by configuration, tested across layers.

The part nobody else has

When the Leader notices your team keeps asking the same expensive question (MRR by plan, churn by segment, revenue waterfall) it asks the Engineer to build a view.

The Engineer creates dash.monthly_mrr_by_plan. A SQL view joining the right tables, handling all edge cases, producing a clean result. Then it does the critical thing: it calls update_knowledge to record the view in the knowledge base. What it contains, what columns it has, example queries.

Next time someone asks about MRR by plan, the Analyst searches knowledge, finds the view, and queries it directly. No complex join. No risk of getting NULL handling wrong. Faster. Pre-validated. Consistent.

The agents build on each other's work. The Engineer creates infrastructure. The Analyst discovers and uses it. The Leader notices patterns and triggers the cycle. Over time, the dash schema fills with views and summary tables that nobody manually created. An analytics layer the system built for itself, shaped by what your team actually asks about.

The full loop

You ask a question. Leader delegates.
The Analyst searches knowledge, writes correct SQL, returns an insight.
Good queries get saved to knowledge. Errors become learnings.
Repeated patterns become views. Views get recorded to knowledge.
Next time, the Analyst uses the view. Faster, pre-validated, consistent.

Dash accumulates institutional knowledge about your data and compounds with use.

Build Your Own

Dash is free and open-source. Check out the GitHub repo and follow the README for in-depth instructions.

Quick Start

git clone https://github.com/agno-agi/dash && cd dash
cp example.env .env  # Add OPENAI_API_KEY

docker compose up -d --build

docker exec -it dash-api python scripts/generate_data.py
docker exec -it dash-api python scripts/load_knowledge.py

This starts Dash with a synthetic dataset (~900 customers, 6 tables) and loads the knowledge base (table metadata, validated queries, business rules). You can demo the entire system without connecting any real data.

Connect to the Web UI

Open os.agno.com
Add OS → Local → http://localhost:8000
Connect

Connect to Slack

Dash lives in Slack. You can DM it or mention it in a channel with @Dash. Each thread maps to one session, so every conversation gets its own context.

Run Dash and give it a public URL (use ngrok for local, or your deployed domain).
Follow instructions in docs/SLACK_CONNECT to create and install the Slack app from the manifest.
Set SLACK_TOKEN and SLACK_SIGNING_SECRET, then restart Dash.

Your browser does not support the video tag.

Adding Your Own Data

Once you have Dash running, making it your own is straightforward. Replace the sample dataset with your data and give Dash the context it needs.

1. Load your tables into the `public` schema

Use whatever pipeline you already have. pg_dump, a Python script, dbt, Airbyte. Dash reads from public and never writes to it. You can use your existing workflow orchestration tools (Airflow, Dagster), or use Dash's built-in scheduler.

2. Add table knowledge

For each table, create a JSON file in knowledge/tables/:

{
  "table_name": "customers",
  "table_description": "B2B SaaS customer accounts with company info and lifecycle status",
  "use_cases": ["Churn analysis", "Cohort segmentation", "Acquisition reporting"],
  "data_quality_notes": [
    "signup_date is DATE (not TIMESTAMP) — no time component",
    "status values: active, churned, trial",
    "company_size is self-reported"
  ],
  "table_columns": [
    {"name": "id", "type": "SERIAL", "description": "Primary key"},
    {"name": "company_name", "type": "TEXT", "description": "Company name"},
    {"name": "status", "type": "TEXT", "description": "Current status: active, churned, trial"}
  ]
}

This is the single highest-leverage thing you can do. The better your knowledge, the better Dash performs.

3. Add validated queries

For your most common questions, write the SQL that gives the correct answer and save it in knowledge/queries/:

-- 
-- Current total MRR from active subscriptions
-- 
SELECT
    SUM(mrr) AS total_mrr,
    COUNT(*) AS active_subscriptions
FROM subscriptions
WHERE status = 'active';
--

This is the easiest way to make sure Dash uses your internal semantics for answering routine questions. Your job is to deliver the best work-with-your-data experience for your team. This makes it possible.

4. Add business rules

Document your metrics, definitions, and gotchas in knowledge/business/:

{
  "metrics": [
    {
      "name": "MRR",
      "definition": "Sum of active subscriptions excluding trials",
      "calculation": "SUM(mrr) FROM subscriptions WHERE status = 'active'"
    }
  ],
  "common_gotchas": [
    {
      "issue": "Active subscription detection",
      "solution": "Filter on ended_at IS NULL, not status column"
    }
  ]
}

Helpful context for Dash. You can skip if it's too much work up front.

5. Load knowledge

python scripts/load_knowledge.py             # Upsert changes
python scripts/load_knowledge.py --recreate  # Fresh start

Scheduled Tasks

Dash ships with a built-in scheduler. You can schedule any type of task that your container can handle.

Out of the box, Dash comes with a pre-built schedule that re-indexes your knowledge base every night at 4am UTC:

mgr.create(
    name="knowledge-refresh",
    cron="0 4 * * *",
    endpoint="/knowledge/reload",
    payload={},
    timezone="UTC",
    description="Daily knowledge file re-index",
)

Same pattern for anything else: daily metric summaries posted to Slack, anomaly detection runs, weekly email digests, automated data quality checks. Register a schedule, point it at an endpoint, Dash handles the rest.

The best agents are proactive. Scheduled tasks are the first step in that direction.

Run Evals

Dash ships with five eval categories:

Accuracy: correct data and meaningful insights
Routing: team routes to the correct agent
Security: no credential or secret leaks
Governance: refuses destructive SQL operations
Boundaries: schema access boundaries respected

python -m evals                      # Run all
python -m evals --category accuracy  # Run one category
python -m evals --verbose            # Show response details

Deploy to Production

You can deploy Dash to Railway with one command:

cp example.env .env.production
# Edit .env.production — set OPENAI_API_KEY

railway login
./scripts/railway_up.sh

Railway is fine for getting started. Eventually you'd want it wherever your existing data infrastructure lives. Everything is containerized so deployment should be straightforward. Be mindful of egress costs.

Production requires a JWT_VERIFICATION_KEY from os.agno.com for RBAC. It would be insane to expose Dash on a public endpoint.

What's Next

Dash is built with systems engineering principles. Five layers: agent, data, security, interface, infrastructure. Each layer affects the others. Design them together and the system compounds.

If there's interest, I'll do deep dives on each layer:

Agent Engineering: The business logic. Model, instructions, tools, knowledge, and the self-learning loop.
Data Engineering: The context layer. Memory, knowledge, learnings, storage. Why the data layer is the most underinvested part of the stack.
Security Engineering: Auth, RBAC, governance, data isolation, and audit trails designed into the system as core primitives.
Interface Engineering: Turning an agent into a product. REST APIs, web UIs, Slack, MCP, and how one agent serves multiple surfaces.
Infrastructure Engineering: How to deploy and scale Dash. Containers, deployment, scheduling.

TLDR

Every company with 30+ people should have an internal data agent. Dash is a free, open-source, self-learning data system made of 3 agents. It uses curated knowledge and continuous learning to get better with every query. Three agents (Leader, Analyst, Engineer) share knowledge and build on each other's work. Security is enforced by the system: read-only connections, schema-level isolation, eval-tested boundaries. Runs in your cloud, lives in Slack. Clone it, run docker compose up, and have the entire system running in minutes.

Built with Agno.

Dash: Self-learning data agent

hi@ashpreetbedi.com (Ashpreet Bedi) — Mon, 02 Feb 2026 00:00:00 GMT

Here's a link to the GitHub repo if you want to dive right in.

OpenAI shared how they built their internal data agent. 6 layers of context, a self-learning memory system, and real lessons from running it in production. The best enterprise data agent out there.

I've been working on a similar agent and their architecture validates the gpu-poor continuous learning approach I've been testing.

Today I'm open-sourcing my version. It's called Dash.

Dash is a self-learning data agent that grounds its answers in 6 layers of context and improves with every run.

Table Usage: schema, columns, relationships
Human Annotations: metrics, definitions, gotchas
Query Patterns: SQL that's known to work
Institutional Knowledge: external docs, research
Memory: error patterns, discovered fixes
Runtime Context: live schema when things change

The 6 Layers of Context

OpenAI's insight: context is everything. Without it, even strong models hallucinate column names, miss type quirks, and ignore tribal knowledge.

Another problem is that most Text-to-SQL agents are stateless, they make mistakes, you fix them, then they make the same mistake again because every session starts fresh.

Dash fixes this by implementing 6 layers of context:

Layer	What it provides	Source
Table Usage	Schema, columns, relationships	`knowledge/tables/*.json`
Human Annotations	Metrics, definitions, gotchas	`knowledge/business/*.json`
Query Patterns	SQL that's known to work	`knowledge/queries/*.sql`
Institutional Knowledge	External docs, research	MCP (optional)
Memory	Error patterns, discovered fixes	`LearningMachine`
Runtime Context	Live schema when things change	`introspect_schema` tool

The agent retrieves relevant context at runtime via hybrid search, uses this to generate grounded SQL, then uses the results to deliver insights.

Your browser does not support the video tag.

OpenAI's post goes into more detail about each layer.

The Self-Learning Loop

Instead of fine-tuning or retraining, Dash learns through two complementary systems:

Static Knowledge: Validated queries, business context, table schemas, data quality notes, metric definitions, tribal knowledge and gotchas. These are curated by your team and maintained alongside Dash (it also updates successful queries as it comes across them).

Continuous Learning: Patterns that Dash discovers through trial and error. The more you use Dash, the better it gets. Eg: Columns named state in one table map to status in another. It also learns what your team is focused on: preparing for an IPO? Dash learns that S-1 metrics live in a separate dataset, that "revenue" means ARR not bookings, and that the board wants cohort retention broken out by enterprise vs SMB. Every learning becomes a data point that improves Dash.

I call this gpu-poor continuous learning (no GPUs are harmed in these experiments) and it's literally 5 lines of code:


learning=LearningMachine(
    knowledge=data_agent_learnings,
    user_profile=UserProfileConfig(mode=LearningMode.AGENTIC),
    user_memory=UserMemoryConfig(mode=LearningMode.AGENTIC),
    learned_knowledge=LearnedKnowledgeConfig(mode=LearningMode.AGENTIC),
)

Build your own

Follow the README for an in-depth guide. Here's a quick start:

# Clone the repo and export your OpenAI API key
git clone https://github.com/agno-agi/dash && cd dash
cp example.env .env  # Add OPENAI_API_KEY

# Start dash
docker compose up -d --build

# Load data and knowledge
docker exec -it dash-api python -m dash.scripts.load_data
docker exec -it dash-api python -m dash.scripts.load_knowledge

This loads sample data (F1 race data from 1950-2020) and the knowledge base (table metadata, validated queries, business rules).

Connect to the UI

Dash comes with a UI out of the box (via Agno). Use it to interact with Dash, view sessions and traces:

Open os.agno.com
Add OS → Local → http://localhost:8000
Connect

Your browser does not support the video tag.

Try these on the F1 dataset:

Who won the most F1 World Championships?
How many races has Lewis Hamilton won?
Compare Ferrari vs Mercedes points 2015-2020

Run evals

Dash ships with an extensive evaluation suite. String matching, LLM grading, and golden SQL comparison. Extend and add your own, this is one of those projects where evals work surprisingly well.

docker exec -it dash-api python -m dash.evals.run_evals         # string matching
docker exec -it dash-api python -m dash.evals.run_evals -g      # LLM grader
docker exec -it dash-api python -m dash.evals.run_evals -g -r   # both + golden SQL

Closing thoughts

Data agents are one of the best enterprise use cases for AI right now. Every company (over a certain size) should have one. Vercel has D0, OpenAI built one.

Dash is my attempt to make that accessible to everyone.

Learn More

Built with Agno. Give it a ⭐️

Dynamic Software

hi@ashpreetbedi.com (Ashpreet Bedi) — Thu, 30 Apr 2026 00:00:00 GMT

For fifty years, software has been static.

Every program you've ever used is a collection of functions run through a hard-coded control flow: If, else, while, for. The functions do the work. Reading from databases. Calling APIs. Transforming data.

Same Input = Same Output. This was the contract for fifty years.

Then 2024 happened. The control flow came alive and created a new category of software. Software that is alive, dynamic, on-demand.

Software is dead, long live Software

Static software is a recording. You press play and you get back exactly what was captured. Same notes, same order, every time. The performance happened once, in a devbox, and now it plays the same tune every time.

Dynamic Software is a live orchestra.

The score exists. The instruments exist. The musicians exist. But what happens in the room tonight depends on the maestro, the players, the moment. The model is the maestro. The tools are the instruments. The control flow is the performance, not the recording.

This is what people feel when they use a great agent and can't quite explain why it feels different. They've spent their whole lives interacting with buttons. Now they're in a room with a live performance for the first time. The software is responding to them, here, now, with judgment and presence. It's listening. It's adjusting. It's alive in a way software has never been alive before.

Recordings are perfect. Live performances aren't. A live orchestra makes choices. Sometimes it stumbles. Sometimes it surprises you. The reason we still pay to hear live music is that something different happens in the room.

The performance is the point.

Dynamic Software is alive. It's not deterministic. It's not perfect. And once you've felt the difference, recordings feel like what they always were. Frozen.

We're not building better recordings. We're building the first generation of software that performs.

Assumptions Dynamic Software breaks

When software comes alive, every assumption built on static software breaks.

Determinism breaks. Same input no longer means same output. The model considers context, memory, learnings. The software does something different on Tuesday afternoon than it did on Monday morning. While this can be (somewhat) controlled in text, we should note that the visual era is next. Charts, dashboards, entire screens generated on-demand. Instead of forcing determinism on non-deterministic software, give in, enjoy the ride.

State and time work differently. Static programs don't need to remember much. The control flow is the same every time, so state lives in a database and is CRUD only. In Dynamic Software, state is context. Memory of past sessions. History of what worked. Knowledge of the domain. The database stops being storage and becomes the context the software runs on.

Sessions follow from this. A static API endpoint is stateless by design. Each request is independent. Dynamic Software is the opposite. A session is a continuous context that spans minutes, days, sometimes weeks. The user comes back, the agent picks up where it left off. Sessions become first-class.

Time changes too. Static software returns in milliseconds, seconds if you don't believe in data co-location. Dynamic Software reasons. It calls tools. It waits for tools to return. It reasons again. A single request takes minutes sometimes. Streaming is the default. Background execution is a core primitive. The HTTP request/response model strains and breaks and so does the default 29s loadbalancer timeout.

The software needs to watch itself. With static software, you can read the code and know what it does. With Dynamic Software, you can't. The control flow is a model and the model is opaque. The only way to know what your software did is to record everything it did. Every reasoning step. Every tool call. Every retrieval. Tracing goes from a debugging tool to the only way to understand your software.

Watching isn't enough. Static programs don't make decisions, so there's nothing to approve. Dynamic Software makes decisions, and decisions have consequences. Some can be made freely. Some need the user. Some need an admin. Your software has to express which is which, and your runtime has to enforce it.

Every one of these is a real engineering problem. Every team building Dynamic Software hits them all. Most spend months solving these from scratch.

A new category needs a new runtime

Static software has a mature runtime. You write Django or Express, deploy to a managed platform, and don't think about HTTP, sessions, scaling, or recovery. The infrastructure is solved. The platform handles it.

Dynamic Software has no equivalent. You write an agent. Then you build six months of infrastructure around it, fixing every edge case manually. Edge cases you only learn after running agents at scale. SSE + websockets. Streaming + background execution. Sessions that survive restarts. Storage you can actually query, not five vendors stitched together. Approval gates that wait for admin sign-off, not just user confirmation. Per-resource, per-tool RBAC. Agents available on Slack, Telegram, WhatsApp, because no one wants to use a custom UI.

This is why 80% of agents don't work, there's a painful amount of grind in the last mile.

The last shift this big was going from desktop apps to web apps. Web software needed its own runtime, its own protocols, its own infrastructure, its own developer tools. We spent two decades building all of it.

Dynamic Software is here. Starting from scratch. Its own runtime. Its own protocols. Its own infrastructure. Its own developer tools.

The next decade

Static software took fifty years to mature. Operating systems, databases, web servers, deploy platforms, observability stacks, identity providers. We forget how recent most of it is. Heroku was 2007. Kubernetes was 2014. Vercel was 2015. The infrastructure we now take for granted is younger than most of the people building on it.

Dynamic Software is at year one.

Whoever builds the runtime, the protocols, the developer tools, the platforms, defines the next era of software. The work ahead is enormous. It is also the most interesting work I've done in the past fifteen years.

Come build with us at Agno.

Evals Don't Give You a Working Product

hi@ashpreetbedi.com (Ashpreet Bedi) — Sat, 10 Jan 2026 00:00:00 GMT

Evals are the holy grail of AI engineering. Or so we've been told.

Two years. Billions in VC funding. Thousands of blog posts about "production-ready agents." An entire industry built around evaluation frameworks, observability platforms, and benchmarks.

The result?

11% of organizations have agents in production. [Deloitte]
40%+ of agentic AI projects will be cancelled by 2027. [Gartner]
80%+ never reach meaningful production. [RAND]

If evals were the answer, these numbers would be different.

Here's what I've learned after two years of shipping agents: passing evals ≠ working product. You can have a green test suite and a broken product. You can hit 95% on your benchmark and watch your agent choke the moment a real user touches it.

Evals don't get you to production. A working product does.

The Pitch vs. The Reality

Here's what the eval-industrial complex told us:

"Evals are the key to production-ready agents" — Databricks

Here's what actually happens:

You build an agent in a Python script. It works. You run your eval suite. Green lights everywhere. You demo it to stakeholders. They love it. Then you try to ship it.

Everything falls apart.

What Evals Don't Test

Your eval suite said the agent was ready. Here's what it missed:

Your agent isn't a function — it's a process. A single response might take 30 seconds. Or 3 minutes. Or 10 minutes if it's doing research. Traditional servers handle stateless request-response cycles in milliseconds. Your agent thinks, waits, calls tools, thinks again. Try fitting that into a Lambda with a 15-second timeout.

State breaks at scale. Works great with 1 user on 1 container. Add more users? State bleeds across sessions. Add more containers? State disappears entirely. Store it in memory? Gone when the process dies. Store it in a database? Now you're building infrastructure you didn't plan for.

Streaming is harder than it looks. In your notebook, responses just appeared. In production, users stare at a blank screen for 8 seconds wondering if the app crashed. You try SSE. Then WebSockets. Then you realize you need durable streams that survive network hiccups, handle backpressure, and resume gracefully after disconnects.

The real world doesn't mock. Your agent calls an external API. In testing, mocks returned clean data every time. In production, the API times out. Returns malformed JSON. Hits rate limits. Requires re-authentication mid-session. Your agent chokes. Your eval suite never saw it coming.

Agents fail because of an inadequate runtime, not intelligence. Evals don't measure any of it.

We've been obsessing over the brain while ignoring the nervous system.

The Trap: Evals Too Early

Here's the thing that really kills projects: writing evals before you have a working product.

Every hour spent writing evals is an hour not spent learning what your product actually needs. You're locking yourself into test cases for a system that doesn't exist yet.

The agent you're building now? It's not the one that's going to ship. It's going to be the second iteration. Or the fifth. The eval suite you wrote for version one is useless for version three. Worse than useless — it's weight you're dragging around.

The eval-industrial complex sold you on this idea that evals-first is disciplined. It's not.

The right sequence:

Build something that runs
Get it in front of real users (internal users are fine)
Learn what breaks, what matters, what "good" actually looks like
Then write evals to lock in that understanding

You can't evaluate what you can't run.

What Evals Are Actually Good For

I'm not saying evals are useless. They're critical — for model providers shipping foundation models. If you're training GPT-5, you need benchmarks. Even for AI engineers building products on top of those models, evals help with:

Catching regressions after you change something
Comparing model versions
Compliance checkboxes

That's it. They won't help you ship. They won't help you scale. They won't help you handle the thousand edge cases that only appear in production.

What Actually Gets You to Production

The market says: Evals → Observability → Production.

This is backwards. Here's what actually works:

Runtime → Production → (Evals + Observability)

The foundation comes first. Everything else is a support layer.

The foundation:

A runtime that handles the weird stuff. Concurrent users. Failure recovery. Long-running stateful processes that survive container restarts. Your agent isn't a microservice — stop treating it like one.
State management that doesn't disappear. Sessions that survive crashes. Context that carries across conversations. Memory that doesn't evaporate when Kubernetes decides to reschedule your pod.
Storage that lives with the agent. The agent's data — sessions, memory, knowledge — stored where the agent runs. In your cloud. Under your control. Send it to a third-party service and you've lost control of your product's brain.
Infrastructure you own. Your environment. Your data. Your competitive advantage.

The support layer (after you're running):

Observability for real production behavior — not synthetic test traces.
Evals to catch regressions — run them in CI, keep them lean.
Tracing to debug when things go wrong.

The support layer matters. But without the foundation, you're just testing in a notebook.

The Questions That Actually Matter

You have a working agent in a Python script. Great. Now answer these:

Where will it run?
Can it handle 100 concurrent users? 1,000?
What happens when a container crashes mid-conversation?
Is streaming smooth or do users watch a loading spinner for 10 seconds?
Where does the agent's memory live? Who owns it?
How do you deploy updates without breaking active sessions?

Evals don't answer any of these questions. The runtime does.

The Path Forward

I built Agno because I got tired of watching good agents die in the gap between "works in a notebook" and "runs in production."

Agno is a runtime for agents. It handles the stuff evals can't test:

Concurrent execution — thousands of users, isolated state
Persistent storage — sessions survive crashes, memory persists across conversations
Streaming that works — SSE out of the box, handles disconnects gracefully
Your infrastructure — runs in your cloud, data never leaves your environment

The eval-industrial complex had their shot. Two years. Billions in funding. The production numbers haven't moved.

Maybe it's time to focus on actually shipping.

Want to build with Agno?

GitHub: agno.link/gh
Documentation: agno.link/docs
AgentOS: os.agno.com

Production means a working product deployed to your cloud — not a green eval suite running on your laptop.

GPU Poor Continuous Learning with Gemini 3

hi@ashpreetbedi.com (Ashpreet Bedi) — Thu, 18 Dec 2025 00:00:00 GMT

Here's a pattern I've been using to make my agents better without fine-tuning or retraining. We'll use a simple system-level learning loop that's surprisingly effective.

The problem with disconnected sessions
What is "gpu-poor continuous learning"
Why Gemini 3 Flash
The learning loop
Demo
What we store (and what we don't)
How to run your own Self-Learning Agent
Why this pattern works

1. The problem with disconnected sessions

Most agents run in independent sessions, disconnected from each other.

You ask a question. You get an answer. Tomorrow you ask a similar question and the agent starts from scratch. It doesn't remember what worked, what failed, or what it figured out along the way.

This is fine for simple tasks. But for anything complex—research, analysis, decision support—it means:

Repeating the same reasoning patterns
Re-discovering the same gotchas
Never building on past success

If your agent can't learn from its own experience, you're leaving performance on the table.

2. What is GPU Poor Continuous Learning

Let me be precise about terminology, because "continuous learning" has a specific meaning in ML.

Traditional continuous learning:

Model weights update over time
Requires compute (GPUs, TPUs)
Risk of catastrophic forgetting
Learning happens in parameters

What I'm doing (GPU Poor Continuous Learning):

Model stays completely frozen
Zero training compute
Learning happens in retrieval
Knowledge is auditable and reversible

The model doesn't get smarter. The system gets smarter.

I call it "GPU Poor" because you get continuous improvement without any of the infrastructure traditionally required for model updates. It's poor man's continuous learning—and it works surprisingly well.

3. Why Gemini 3 Flash

I built this with Gemini 3 Flash, which launched today. Here's why:

Factor	Gemini 3 Flash
Cost	$0.50/1M input, $3/1M output
Speed	3x faster than 2.5 Pro
Context	1M tokens input
Agentic coding	78% SWE-bench (beats Gemini 3 Pro)
Context caching	90% cost reduction for repeated tokens

For a self-learning agent, you want:

Low cost — You're making many calls per session
Fast inference — Tight feedback loops matter
Large context — Prior learnings need room alongside new data
Strong tool use — The agent needs to reliably call save/retrieve functions

Gemini 3 Flash hits all four. The 1M context window is especially useful—you can include substantial prior learnings without truncating.

4. The learning loop

Here's the core pattern:

                         Query
                           │
                           ▼
                   Search learnings
                           │
                           ▼
                       Research
                           │
                           ▼
                      Synthesize
                           │
                           ▼
                        Reflect
                           │
              ┌────── reusable? ──────┐
              │                       │
             Yes                      No
              │                       │
              ▼                       │
        Propose to user               │
              │                       │
       ┌── approved? ──┐              │
       │               │              │
      Yes              No             │
       │               │              │
       ▼               │              │
     Save              │              │
       │               │              │
       └───────────────┴──────────────┘
                       │
                       ▼
                    Answer

Key details:

Search first — The agent must explicitly search the knowledge base before doing anything else. This isn't automatic; it's enforced through instructions.
Most queries won't produce a learning — This is expected. Learnings should be rare and high-signal, not routine.
Human-in-the-loop gating — The agent proposes learnings, but only saves them with explicit approval. If the user declines, the agent moves on without re-proposing.

5. Demo

Here's a demo of the agent in action.

Your browser does not support the video tag.

6. What we store (and what we don't)

The biggest mistake is storing too much.

A learning is worth saving if it is:

Specific: "When comparing ETFs, check expense ratio AND tracking error" not "Look at ETF metrics"
Actionable: Can be directly applied in future similar queries
Generalizable: Useful beyond this specific question

Do not save: raw facts, one-off answers, summaries, speculation, or anything unlikely to recur.

Each learning is structured:

{
    "title": "ETF comparison checklist",
    "context": "When comparing similar ETFs for investment decisions",
    "learning": "Always check both expense ratio AND tracking error. Low expense ratio with high tracking error can cost more than a slightly more expensive fund with tight tracking.",
    "confidence": "high",
    "type": "heuristic",
    "created_at": "2025-12-17T10:30:00Z"
}

Most tasks will not produce a learning. That's expected.

7. How to run your own Self-Learning Agent

I'm providing cookbooks for running your own self-learning agent, built using:

FastAPI application for running the agent
Postgres database for storing sessions, memory, and knowledge

Here's the link to the code.

You can wrap this up in a container and deploy it to Railway. Here's a sample repository you can use.

Steps to run your own Self-Learning Agent

1. Clone the repo

git clone https://github.com/agno-agi/agno.git
cd agno

2. Create and activate a virtual environment

uv venv .gemini-agents --python 3.12
source .gemini-agents/bin/activate

3. Install dependencies

uv pip install -r cookbook/02_examples/04_gemini/requirements.txt

4. Set environment variables

# Required for Gemini models
export GOOGLE_API_KEY=your-google-api-key

# Required for agents using parallel search
export PARALLEL_API_KEY=your-parallel-api-key

5. Run Postgres with PgVector

Postgres stores agent sessions, memory, knowledge, and state. Install Docker Desktop and run:

./cookbook/scripts/run_pgvector.sh

Or run directly:

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql \
  -v pgvolume:/var/lib/postgresql \
  -p 5532:5432 \
  --name pgvector \
  agnohq/pgvector:18

6. Run the Agent OS

Agno provides a web interface for interacting with agents. Start the server:

python cookbook/02_examples/04_gemini/run.py

Then visit os.agno.com and add http://localhost:7777 as an endpoint.

8. Why this pattern works

This approach works because it separates concerns that are usually conflated:

Concern	Traditional	GPU Poor
Reasoning	Model	Model (unchanged)
Learning	Model weights	Knowledge base
Memory	Context window	Persistent storage

Benefits:

Auditable — You can see exactly what the agent "learned"
Reversible — Delete a bad learning, system improves
Fast feedback — No training cycles, immediate improvement
No forgetting — New learnings don't overwrite capabilities

The pattern generalizes beyond research. Use it for:

Market analysis
Competitive intelligence
Technical support
Decision logging
Policy tracking

Anywhere beliefs evolve, learnings beat stateless answers.

Thank you for reading! Feel free to reach out on X if you have questions or feedback.

Introducing Agno

hi@ashpreetbedi.com (Ashpreet Bedi) — Wed, 15 Oct 2025 00:00:00 GMT

✨ The Multi-Agent Framework, Runtime, and UI.

Over the past 3 years, I've been obsessed with building the perfect harness for multi-agent systems. A mission to deliver the best system for building, deploying and scaling agentic software.

Today, Agno is used by thousands of builders at the largest companies in the world, including 3 of the fortune 5. Let's dive in.

What is Agno?

Agno is a multi-agent framework, runtime, and UI. It takes a systems engineering approach to agent development by delivering 3 tightly coupled components:

Framework: for building multi-agent systems.
Runtime: for deploying multi-agent systems.
UI: for managing multi-agent systems.

These 3 components form the harness for the perfect agentic system.

Can you build these yourself? Absolutely. But Agno gives you speed, speed gives you momentum, and momentum is everything.

Enough talk, let's see some code.

Here's a fully working Agent, with conversation history, access to tools via MCP, deployed as a FastAPI app - in 20 lines of code.

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude
from agno.os import AgentOS
from agno.tools.mcp import MCPTools

# ************* Create Agent *************
agno_agent = Agent(
    name="Agno Agent",
    model=Claude(id="claude-sonnet-4-5"),
    db=SqliteDb(db_file="agno.db"),
    tools=[MCPTools(url="https://docs.agno.com/mcp", transport="streamable-http")],
    add_history_to_context=True,
    markdown=True,
)

# ************* Create AgentOS *************
agent_os = AgentOS(agents=[agno_agent])
app = agent_os.get_app()

Run your AgentOS using fastapi dev agno_agent.py and chat with it on the AgentOS UI.

Your browser does not support the video tag.

Deploy your FastAPI app to your cloud of choice, and voilà, you're live in production. It's impossible to move this quickly without Agno.

✨ Part I: The Framework

Agent Engineering is an exercise in iteration. You can't iterate if you don't have a v0.1. A batteries included setup gets your agent in the hands of your internal team. Then you can edit in a loop.

[stolen from vtridvedy]

Agno delivers a full-featured, performance-optimized agent framework with every primitive you can think of. Session storage, memory, knowledge (RAG), context management, tools (pre-built and MCP), guardrails, dependency injection, human in the loop, and more. Every part of agent execution is customizable via pre-hooks, post-hooks, and state management, so you're never boxed into default behavior.

Agents are completely type-safe, you can use them as chatbots (string input, string output) or with structured inputs and outputs. Not only that, Agents can use separate parser-models to generate structured outputs, so reasoning is not compromised (only available on Agno).

✨ The Multi-Agent Paradox

The big debate in multi-agent systems is whether agents should execute other sub-agents (handoff-approach), or the developer should programmatically define the flow of execution (workflow-approach).

The answer: why not both?

With Agno, Agents can be executed by themselves, as part of a multi-agent Team (autonomous execution) or a step-based Workflow (controlled execution). Your use-case determines your approach.

Agent Teams have a shared state, agentic context management (i.e. the team leader manages the context for the team), shared memory and knowledge. Teams can also execute other teams, or workflows.

Workflows are deterministic, where each step can be an agent, team, workflow, or a plain old python function. Steps can be parallelized, branched, run via conditional logic or loops.

There's so much more I can cover here, but i'll save that for the docs. The gist is, when building Agents, my goal is to get to v0.1 within a few hours and iterate from there. Agno gives me that.

New agent engineers think that building the solution is the hard part - NO. Finding the right use-case is the hard part. To do that, you need to tackle 3, 5, or 10 different problems. Agno gets you to use-case #10, which is where the magic happens.

✨ Part II: The Runtime

Seasoned builders know that to build successful agentic products, you need to iterate on multiple variations before you hit gold. Also:

You're not going to build by yourself, you need to get it in the hands of your team quickly (especially the non-technical folks).
You need some sort of system to test, serve and integrate with your product as quickly as possible (to get user feedback).

This means you need to build an API to serve your agents, your product will integrate with this API via REST or WebSockets. You also need a UI to test, monitor, debug and manage your system.

✨ You need an AI backend.

This is where the AgentOS comes in. In the simplest terms, it's a FastAPI application with pre-built endpoints for serving your agents, teams and workflows. You can also manage knowledge bases, user memories, agent sessions, and evaluate your system in real-time.

The AgentOS is a high-performance runtime for multi-agent systems. It gives you a ready-to-use FastAPI app for deploying your agents, and an integrated UI for testing, monitoring and managing them.

Deploy your AgentOS to your cloud of choice. Session data, knowledge, memories, all live in your database. No data ever leaves your system.

In my experience, once you have a semblence of an Agent you like, you need to get it in the hands of your team and early users quickly. The pre-built api endpoints give you such an incredible headstart that its almost a no-brainer to use.

Here are the pre-built api endpoints, ready to use:

✨ Part III: The Control Plane

Wait, there's more?

The AgentOS comes with a web interface that connects directly to the AgentOS runtime (using the pre-built api endpoints). It's an novel architecture, where the web app (running in your browser) connects directly to the AgentOS runtime. You can test (chat and run) your agents, teams and workflows, manage knowledge bases, user memories, and evaluate your system in real-time. Here's how it looks:

If you're using a tracing service, this will change how you look at things. You're not sending any data out, you're not paying for retention costs, and you're not worrying about data privacy. The app pulls in sessions directly from the Agent's database and show's them:

The traces and runtime data is stored in your database, and the AgentOS UI connects from your browser to the AgentOS runtime.

Its a novel architecture designed to give you complete data ownership:

Your Infrastructure, Your Data: Your AgentOS runs in your cloud.
Zero Data Transmission: No conversations, logs, or metrics are sent to external services. They belong to you.
Private by Default: All processing, storage, and analytics happen in your environment.

Personally, I'm surprised we collectively agreed to hand over every user interaction to tracing companies. Just the retention issues are enough to make you think twice, let alone the data privacy concerns.

For companies building agents, Agno delivers the complete solution.

Unless you're an infra or devtools company, you're focused on solving user problems. Agno free's up your mental capacity so you can a) find the right problem to tackle, b) build your MVP quickly, and c) iterate and improve your product.

Thousands of builders choose Agno, thank you for letting us be a part of your journey ✨

Want to build with Agno?

Agno documentation: agno.link/docs
Signup for the AgentOS: os.agno.com
Star Agno on Github: agno.link/gh

The Programming Language for Agentic Software

hi@ashpreetbedi.com (Ashpreet Bedi) — Wed, 18 Feb 2026 00:00:00 GMT

Every era of computing develops its own programming language.

The mainframe era had COBOL and Fortran. The systems era had C. The web era had JavaScript and Python. Each emerged for the same reason, the previous generation could no longer express the new abstraction.

We are now in the agentic era.

Software is no longer just executing predefined instructions. It is reasoning over context, calling tools, retrieving knowledge, learning from past runs, and making decisions at runtime.

When the contract of software changes, the language must change too.

What makes a programming language?

A programming language is made of three things:

Primitives to think and build with.
An engine to execute those primitives.
A runtime that governs memory, I/O, permissions, and interaction with the outside world.

An SDK alone is not a programming language. A collection of utilities is not a programming language. Without an execution engine and a runtime that enforces behavior, you have a library, not a language.

Python gives you lists, functions, and classes. Its interpreter runs them. Its runtime manages memory, exceptions, and interfaces with the operating system.

React gives you components and state. Its reconciler computes updates. The browser handles rendering and events.

Applying this to agentic systems:

Agno gives you agents, teams, workflows, memory, knowledge, tools, guardrails, and approval flows.
The Engine runs them: model calls, tool execution, context construction, and iteration.
AgentOS, the production runtime, governs execution and interfaces with the outside world via an API: streaming, request-level isolation, authentication, RBAC, monitoring, background execution.

The runtime is stateless. Sessions, memory, state and traces persist in your database. Permissions are enforced at request boundaries.

Agno provides the SDK + Engine + Runtime for agentic software.

Agents are the new programs

Traditional applications are collections of deterministic programs. Every path is written in advance. The system does exactly what the developer specified.

Agents change that.

An agent reasons over context. It chooses tools dynamically. It retrieves knowledge. It remembers previous runs. It decides which path to take at runtime.

This is still software, but the path between input and output is no longer fixed.

This does not mean deterministic systems disappear. For many workloads, static pipelines are faster, cheaper, and more reliable.

But when the system must pause, reason, retrieve, and adapt dynamically, predefined control flow breaks down.

For decades, the contract was simple:

Same input, same output.

Agentic software breaks that contract.

The same input can produce different outputs depending on memory, context, retrieval, and prior state. If execution is dynamic, the language must express that natively.

Agentic software needs a new contract

Agentic software requires new capabilities built into its programming language:

1. A new interaction model

Static software receives a request and returns a response.

Agentic software streams reasoning, tool calls, intermediate results, and pivots in real time. The execution path can change mid run, or pause for days. The system may retrieve knowledge halfway through and completely redirect its reasoning.

Streaming and iteration are the default and the language for agentic software must treat them as first class behavior.

2. A new governance model

Traditional systems execute predefined decisions within rules written in advance. Code does not decide whether to send an email or issue a refund. It simply follows instructions.

Agents make decisions, and not all decisions are equal.

Some actions are low risk: summarizing text or searching documentation. Some require user approval: sending emails or booking travel. Some require admin approval: issuing refunds, deleting records, changing permissions.

Without runtime-enforced approval boundaries, an agent that can draft an email can also execute a payment. The difference must be enforced by the runtime, not prompt engineering.

Governance must be part of the agent definition itself and the runtime must enforce it.

3. A new trust model

Static systems are trusted because every path is written in advance.

Agents introduce probabilistic reasoning into the execution path.

If guardrails and evaluation run outside the runtime, they are advisory rather than enforceable. Unsafe output can be produced before policy checks intervene.

Trust must therefore be part of the runtime semantics: guardrails, evaluation, logging, pre and post-response checks integrated into execution.

Interaction. Governance. Trust.

These are language-level concerns in the agentic era.

What this looks like in practice

Here is a lightweight coding agent that writes, reviews, and iterates on code. It remembers project conventions, retrieves knowledge, learns from past runs, and operates within explicit governance boundaries.

This example is intentionally minimal but production-capable. It has persistence, memory, learning, and controlled tool execution.

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.learn import LearnedKnowledgeConfig, LearningMachine, LearningMode
from agno.models.openai import OpenAIResponses
from agno.tools.coding import CodingTools
from agno.tools.reasoning import ReasoningTools

gcode = Agent(
    name="Gcode",
    model=OpenAIResponses(id="gpt-5.2"),
    db=SqliteDb(db_file="agno.db"),
    instructions=instructions,

    # Knowledge: searchable long-term memory
    knowledge=gcode_knowledge,
    search_knowledge=True,

    # Learning: extract and store learnings over time
    learning=LearningMachine(
        knowledge=gcode_learnings,
        learned_knowledge=LearnedKnowledgeConfig(mode=LearningMode.AGENTIC),
    ),

    # Tools: controlled extensions
    tools=[CodingTools(base_dir=workspace, all=True), ReasoningTools()],

    # Memory: learn user preferences
    enable_agentic_memory=True,

    # Context: include prior runs
    add_history_to_context=True,
    num_history_runs=10,
    markdown=True,
)

Notice what is being defined:

Knowledge as a first class primitive
Learning as a built in capability
Tools as controlled extensions
Memory and historical context as defaults
A runtime that governs how the system executes

These are not utilities or third party integrations. They are the vocabulary of the agent and enforced by the runtime and execution layer.

That is what a programming language does. It gives you the right primitives for the era you are building in. You define the behavior. The language enforces it.

Every era gets the language it needs

COBOL abstracted business logic away from assembly. C abstracted system engineering without hiding it. Python abstracted memory management and low level primitives to accelerate iteration.

Each language captured the dominant abstraction of its era.

The agentic era introduces a new abstraction: systems that reason, remember, and decide at runtime.

The contract has changed. The primitives have changed. The execution model has changed.

The language must change too. That language is Agno.

There are many that argue that because Agno is written in Python, it cannot be a programming language.

If you wish to make an apple pie from scratch, you must first invent the universe. — Carl Sagan

Learning Machines: Why AI Memory Hasn't Been Solved (Yet)

hi@ashpreetbedi.com (Ashpreet Bedi) — Wed, 07 Jan 2026 00:00:00 GMT

Every AI memory tool I've used is missing something.

After reading hundreds (maybe thousands) of opinions, posts, and papers on agentic memory, I've come to three conclusions.

1. No one has it figured out.

Claude has the most impressive memory system I've seen. It feels natural. It never shouts. It knows what to reveal and when.

But we haven't figured out how to give developers the same capability for their own agents. The tools we have are... not there.

2. Maybe we're looking at it wrong.

Maybe memory is the wrong framing. What agents are really doing is learning. Learning about the user, the task at hand, learning insights and patterns, learning from decisions - good and bad, the feedback received. Learning from every interaction.

Everyone's rushing to build memory extraction systems — pull out facts, store them in a vector (or graph 🙄) database, retrieve them using complex mechanisms. But that's only half the problem.

But the hard part is integration: When does the learning happen? Before the response? After? In parallel? Is it automatic or does the agent control it? And critically — how do you teach the agent to use that information properly? Integration is what makes the system work.

You can't just tell an agent "you know XYZ about the user". You need to teach it how to use that knowledge. How to learn from it. How to prioritize it. How to act like a partner, a colleague, a companion who genuinely knows you — not a machine reciting facts from a database.

3. User memory is only part of the story.

User profiles and conversation summaries are just two types of learnings. But what about patterns and insights that worked? The entities involved - companies, people, projects? The decisions made and why? The feedback received? How should the agent use all these learnings to improve itself?

These aren't separate systems. They're all forms of learning.

Memory is Learning

This realization led me to build something different: the Learning Machine, a unified learning system that helps agents continuously integrate information from their environment.

Here's the difference:

Traditional "Memory":
Message → Extract → Store → Retrieve → Dump into Prompt → Repeat

Learning Machine:
User Message ──────► Recall from Stores ◄────────┐
                            │                    │
                            ▼                    │
                      Build Context              │
                            │                    │
                            ▼                    │ LearningMachine
                Agent Responds (with tools)      │
                            │                    │
                            ▼                    │
                   Extract & Process             │
                            │                    │
                            ▼                    │
              Update Stores (agent learns) ──────┴──► Periodic Curation

The agent isn't just fed memories. It participates in learning, curating what it learns, and integrating that knowledge back into every response.

The goal: an agent on interaction 1000 is fundamentally better than it was on interaction 1 — across the board, not just with the same user.

What It Looks Like in Action

A new employee on their first day asks: "I'm starting work on the cloud migration project. What should I know?"

The agent responds with full context, even though it's never talked to this person before. It knows Acme is migrating from AWS to GCP. It knows Alex (CTO) is leading it. It knows Phase 2 is the most compute-heavy. It shares migration patterns from similar past projects. It knows that the pricing is changing next quarter.

How? Three types of learning from past interactions:

Session 1 (Alex, CTO):
"I'm Alex, CTO at Acme. We're migrating from AWS to GCP and
I need help planning the timeline."

→ User Profile captures: Alex, CTO, involved in planning discussions
→ Entity Memory captures: Acme (company), AWS→GCP migration (project)
→ Session Context: Goal is migration timeline planning

Session 2 (next day, same user, different session):
"Just heard GCP is changing their pricing next quarter.
How does that affect our migration?"

→ Agent recalls: Acme, AWS→GCP migration, Alex is CTO, 3-phase timeline
→ Agent responds: "That could impact your timeline. Last time we mapped
   out a 3-phase approach with Phase 2 being the most compute-heavy.
   Want me to model the cost implications for each phase?"

Session 3 (different user, same org namespace):
"I just joined to help with the Acme cloud project. What should I know?"

→ Entity Memory: "Acme is migrating AWS to GCP. Alex (CTO) is leading it."
→ Learned Knowledge: Shares migration patterns from past projects
→ Agent responds with full context — even though it never talked to this user

Three sessions. Three types of learning. Cross-user knowledge sharing.

This is possible. Today.

The Architecture: Learning Stores

The key innovation behind the Learning Machine is the learning protocol and learning stores. The protocol defines how stores capture, process, and integrate knowledge. Each store is configured independently. Mix and match as needed. The Learning Machine orchestrates it all.

These are the stores I'm working on:

Store	What It Captures	Scope
User Profile	Preferences, memories, personal context	Per user
Session Context	Goal, plan, progress, summary	Per session
Entity Memory	Facts, events, relationships about external things	Configurable
Learned Knowledge	Insights, patterns, best practices	Configurable
Decision Logs	Why decisions were made	Configurable
Behavioral Feedback	What worked, what didn't	Per agent
Self-Improvement	Evolved instructions	Per agent

Show Me Some Code

One agent. Four learning stores. Configured independently. Orchestrated by the Learning Machine.

from agno.agent import Agent
from agno.db.postgres import PostgresDb
from agno.models.openai import OpenAIResponses

agent = Agent(
    model=OpenAIResponses(id="gpt-5.2"),
    db=PostgresDb(db_url="postgresql://..."),
    learning=LearningMachine(
        knowledge=my_vector_store,  # or graph if that's your thing
        user_profile=UserProfileConfig(
            mode=LearningMode.BACKGROUND,
            enable_agent_tools=True,
        ),
        session_context=SessionContextConfig(
            enable_planning=True,
        ),
        learned_knowledge=LearnedKnowledgeConfig(
            mode=LearningMode.PROPOSE,
        ),
        entity_memory=EntityMemoryConfig(
            mode=LearningMode.BACKGROUND,
        ),
    ),
)

The best part? You can build custom learning stores by extending the LearningStore protocol. Need project context? Build a ProjectContextStore. Need to track accounts? Build an AccountStore.

Taking Inspiration from Claude

Claude's memory feels magical. It's natural, contextual, never announces "saving to memory". It just knows you.

But here's the thing: you can't build with it. Claude's memory is a consumer product feature. The API gives you nothing. If you want learning for your agents, you're on your own. Enter Learning Machine.

Here's what Claude does well, and what Learning Machine adds:

Claude feels natural. It never announces "saving to memory". So does Learning Machine. We inject context based on each store and control how the agent learns from it. No fact dumps.

Claude learns about its users over time. Preferences, history, personal context. So does Learning Machine. But we also add sessions, entities, patterns, and decisions. The full picture, not just the user.

Claude is scoped to a single user. Makes sense for a consumer product. Learning Machine adds namespace scoping: keep it private to a user, share across a team, or make it global. You control the boundaries.

Claude has fixed memory types. You can't change how it works. Learning Machine is extensible via protocol. Build your own stores for whatever your domain needs.

Claude is a closed system. Its memory lives inside Claude. Learning Machine is open source, fully customizable, and yours to extend.

I studied what makes Claude's memory feel good. Then built something you can actually use and extend.

What This Unlocks

Here's what's possible when agents learn across users, sessions, and time:

A support agent where ticket #1,000 gets resolved better and faster — because it learned from tickets #1-999.
A customer success agent that remembers every account's stack, contracts, and conversations — across your entire team.
A healthcare agent that knows your full history — not just what's in today's chart, but every conversation (with different doctors), symptom, and concern you've ever mentioned.
A financial advisor that remembers your risk tolerance, goals, and every "what if" scenario you've ever explored — across years of conversations.
An agent that rewrites itself — analyzing its failures and proposing: "I should stop doing X."

That last one is the endgame. Agents that learn from their own mistakes and rewrite their own instructions. Human approves. Agent evolves. Continuous improvement.

Current Status

Learning Machine is part of Agno and I'm in the final stages of testing Phase 1. Here's where things stand:

Phase	What's Included	Status
Phase 1	User Profile, Session Context, Entity Memory, Learned Knowledge	Built, testing now
Phase 2	Decision Logs, Behavioral Feedback	Planned
Phase 3	Self-Improvement	Planned

If you're eager to dig in, here's the PR: learning-machine-v0

Want to get involved? DM me if you're interested in learning more or helping out.

Memory was never the goal. Learning was.

If you enjoyed reading this, checkout Agno on GitHub.

Questions or feedback? Reach out on X.

Learning Machines: Technical Design

hi@ashpreetbedi.com (Ashpreet Bedi) — Thu, 08 Jan 2026 00:00:00 GMT

On Monday I introduced Learning Machines and yesterday I shared that it's finally working. Today I'll show you how it works under the hood.

First, Let's Recap

After reading hundreds of papers on agentic memory and trying out every possible tool, I came to the simple conclusion that maybe we're looking at memory wrong.

Memory is just... learning. Learning about the user, the task at hand, learning insights and patterns, learning from decisions - good and bad, the feedback received. Learning from every interaction. Everything else is integration (how the agent uses these learnings) and curation (decay, pruning, deduplication).

So I built Learning Machines: A system that helps agents continuously learn from every interaction.

I started working on it dec 31, and got a basic working version yesterday. Here's the PR for those interested: learning-machine-v0

Now let's dig into the technical details.

The Learning Protocol

The key behind it all is the Learning Protocol. It's a simple interface for building Learning Stores -- user profiles, session context, learned knowledge, entity memory, etc.

Let's take a look at the protocol:

@runtime_checkable
class LearningStore(Protocol):
    """Protocol that all learning stores must implement."""

    @property
    def learning_type(self) -> str:
        """Unique identifier (e.g., 'user_profile')."""
        ...

    def recall(self, **context) -> Optional[Any]:
        """Retrieve learnings from storage."""
        ...

    def process(self, messages: List[Any], **context) -> None:
        """Extract and save learnings from conversation."""
        ...

    def build_context(self, data: Any) -> str:
        """Build context string for agent's system prompt."""
        ...

    def get_tools(self, **context) -> List[Callable]:
        """Get tools to expose to agent."""
        ...

Five functions. Everything else is optional.

Why this matters: You can build your own learning store in ~50 lines. Most memory systems are thousands of lines of config. This is ~50. Legal docs. Medical records. Codebases. Sales pipelines. Whatever your domain needs.

You can even build personalized LearningStores for your writing styles, for your daily to-do's, for your emails, for your shopping lists. The real value of this approach is its extensibility.

The Learning Machine

The protocol lets you build stores. But stores need to plug into the agent somehow. That's what LearningMachine does.

User Message ──────► Recall from Stores ◄────────┐
                            │                    │
                            ▼                    │
                      Build Context              │
                            │                    │
                            ▼                    │ LearningMachine
                Agent Responds (with tools)      │
                            │                    │
                            ▼                    │
                   Extract & Process             │
                            │                    │
                            ▼                    │
              Update Stores (agent learns) ──────┴──► Periodic Curation

Recall → Build context → Run agent → Extract → Store. That's the loop.

Developer Experience

Three levels of complexity:

Dead Simple

agent = Agent(
    model=model,
    db=db,
    learning=True,  # Enables user_profile in BACKGROUND mode
)

Pick What You Want

agent = Agent(
    model=model,
    db=db,
    learning=LearningMachine(
        user_profile=True,
        session_context=True,
        learned_knowledge=True,
        entity_memory=True,
    ),
)

Full Control

agent = Agent(
    model=model,
    db=db,
    learning=LearningMachine(
        user_profile=UserProfileConfig(
            mode=LearningMode.AGENTIC,
        ),
        session_context=SessionContextConfig(
            enable_planning=True,
        ),
        learned_knowledge=LearnedKnowledgeConfig(
            mode=LearningMode.PROPOSE,
            namespace="engineering",
        ),
        entity_memory=EntityMemoryConfig(
            mode=LearningMode.BACKGROUND,
        ),
    ),
)

Build Your Own Learning Store

This is the win. Implement the protocol, plug it in:

@dataclass
class ProjectContextStore:
    """Custom store for project-specific context."""

    @property
    def learning_type(self) -> str:
        return "project_context"

    def recall(self, project_id: str, **kwargs) -> Optional[ProjectContext]:
        # Retrieve from your storage
        ...

    def process(self, messages: List[Any], project_id: str, **kwargs) -> None:
        # Extract and save
        ...

    def build_context(self, data: Any) -> str:
        if not data:
            return ""
        return f"\n{data.summary}\n"

    def get_tools(self, **kwargs) -> List[Callable]:
        return []  # Or return tools for agentic mode

# Plug it in
learning = LearningMachine(
    custom_stores={
        "project": ProjectContextStore(),
    },
)

~50 lines. 5 functions. Your domain, your rules. Build a Learning Store for legal docs, medical records, codebases, sales pipelines. This is the whole point behind the Learning Machine.

Built-in Stores

Phase 1 includes four stores:

Store	What It Captures	Scope	Storage
User Profile	Name, work context, preferences, communication style	Per user (`user_id`)	Database (direct lookup)
Session Context	Summary of conversation, goal, plan steps, progress	Per session (`session_id`)	Database (direct lookup)
Learned Knowledge	Insights, patterns, best practices. Things that apply across users	Configurable namespace	Knowledge base (vector search)
Entity Memory	Facts, events, and relationships about external things — companies, people, projects	Configurable namespace	Database (direct lookup + search)

Key Design Decisions

Learning Modes

Different use cases need different extraction modes.

class LearningMode(Enum):
    BACKGROUND = "background"   # Automatic extraction after each conversation
    AGENTIC = "agentic"         # Agent decides via tools
    PROPOSE = "propose"         # Agent proposes, user confirms
    HITL = "hitl"               # Human-in-the-loop approval (future)

BACKGROUND is invisible. The user never sees extraction happening. This is what makes Claude's memory feel natural.

AGENTIC gives control. The agent decides what's worth remembering. You can see the tool calls. Less noise, more transparency.

PROPOSE is for medium-stakes learning. Agent suggests, human approves. Good for shared knowledge bases where bad data spreads.

HITL is for the highest-stakes learning. Explicit human approval required.

Namespace Scoping

Some learnings should be private. Some should be shared. Namespaces enable this.

# Private to this user
LearnedKnowledgeConfig(namespace="user")

# Shared within engineering team
LearnedKnowledgeConfig(namespace="engineering")

# Shared with everyone
LearnedKnowledgeConfig(namespace="global")

This is what enables cross-user learning. This is what made yesterday's experiment work — Alice's insight helped Bob because they shared a namespace.

Entity Memory: Three-Tier Memory System

Entities (people, companies, projects) hold different types of information:

Facts: Semantic knowledge ("Uses PostgreSQL", "Based in London")
Events: Episodic memories ("Launched v2 on Jan 15", "Raised Series A")
Relationships: Graph connections ("Bob is CEO of Acme", "Acme acquired StartupX")

Flat list doesn't work. You need to query "what do we know about Acme" differently than "what happened with Acme."

What's Next

Phase	What's Included	Status
Phase 1	Learning Protocol, Learning Machine + 4 Learning Stores	Built, currently testing and fixing bugs
Phase 2	Decision Logs and Behavioral Feedback. Agents that know why they did what they did, and what worked	Planned
Phase 3	Self-Improvement	Planned

Phase 3 is the endgame. Agents that analyze their own failures and propose: "I should stop doing X." Human approves. Agent evolves. No retraining. No fine-tuning. Just learning.

Want to dig in? Here's the PR: learning-machine-v0

Memory was step one. Learning is what comes next.

If you enjoyed reading this, checkout Agno on GitHub.

Questions or feedback? Reach out on X.

Memory: How Agents Learn

hi@ashpreetbedi.com (Ashpreet Bedi) — Mon, 22 Dec 2025 00:00:00 GMT

It's almost 2026. Agents can follow complex instructions, use dozens of tools, and work autonomously for hours. But ask them the same question twice and they start from scratch. They don't remember what worked, what failed, or what they figured out along the way.

What makes ChatGPT and Claude great personal assistants? Memory.

Here's the dirty secret: when building agents with the API, we've made them capable, but we haven't yet figured out how to make them learn.

What is memory
How memory enables learning
Three patterns (with code)
Video demo
What makes a good learning
Get started

Wanna jump straight to the code? Here you go. Cookbooks 2, 4 and 7 are what you're looking for.

1. What is memory?

"Memory" gets thrown around loosely. Chat history? Context window? Vector database? Let's be precise.

There are three types of memory that matter for agents:

Session Memory

The conversation context. What was said five messages ago. This is a solved problem: store messages in a database, retrieve them before every response, add them to the context.

Session memory is useful but limited. It disappears when the conversation ends. It's not really memory, it's just context.

User Memory

Facts about a specific user that persist across sessions. Preferences, goals, constraints.

When a user says "I'm interested in AI stocks and have moderate risk tolerance", that's worth remembering, not just for this conversation, but for every future conversation with that user.

This is powerful, but it's still not learning. User memory is about recall, not improvement.

Learned Memory

This is where knowledge gets built. As agents interact with the world, they discover insights that apply generally, not just to one user, but to anyone asking similar questions.

When your finance agent discovers that "when comparing ETFs, check both expense ratio AND tracking error", this insight is worth saving, not just because one user asked, but because it makes the agent better at ETF comparisons for everyone.

Here's the beauty: knowledge compounds. The more the agent learns, the better it gets. And unlike weight updates, this knowledge is tangible: you can inspect it, edit it, delete it. No retraining required.

If you're building agents without learned memory, you're leaving performance on the table.

2. How memory enables learning

Here's the core insight: learning is remembering what worked.

Without memory, agents are stateless. Every session is day one:

Without Memory	With Memory
Re-discovers the same patterns	Searches prior learnings before acting
Repeats the same mistakes	Applies insights from past sessions
Re-asks the same questions	Builds domain knowledge over time
Can't build on prior success	Gets better the more you use it

The best part: the model doesn't need to get better for the system to improve. Learning happens in retrieval, not in weights. And as models improve, your system improves too — for free.

I call this GPU Poor Continuous Learning: continuous improvement without fine-tuning, retraining, or any of the infrastructure traditionally required for model updates. Just a knowledge base that grows smarter over time.

The model doesn't get smarter. The system gets smarter.

3. Three patterns for agent memory

Let me show you how to implement the three patterns, with a bonus at the end.

Pattern 1: Session Memory

Store messages in a database, retrieve them before every response, add them to the context. Agno gives you this out of the box — just give your agent a database.

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.google import Gemini
from agno.tools.yfinance import YFinanceTools

agent_db = SqliteDb(db_file="tmp/agents.db")

agent = Agent(
    model=Gemini(id="gemini-3-flash-preview"),
    tools=[YFinanceTools()],
    db=agent_db,
    add_history_to_context=True,
    num_history_runs=5,
)

if __name__ == "__main__":
    session_id = "finance-session"

    # Turn 1: Analyze a stock
    agent.print_response("Quick investment brief on NVIDIA", session_id=session_id)

    # Turn 2: Agent remembers NVDA from turn 1
    agent.print_response("Compare that to Tesla", session_id=session_id)

    # Turn 3: Recommendation based on full conversation
    agent.print_response("Which looks like the better investment?", session_id=session_id)

Use a consistent session_id to persist conversation across runs.

Pattern 2: User Memory

Remember facts about the user across sessions. The MemoryManager extracts preferences automatically and stores them in the database.

from agno.agent import Agent
from agno.memory import MemoryManager
from agno.models.google import Gemini
from agno.db.sqlite import SqliteDb

agent_db = SqliteDb(db_file="tmp/agents.db")

memory_manager = MemoryManager(
    model=Gemini(id="gemini-3-flash-preview"),
    db=agent_db,
)

agent = Agent(
    model=Gemini(id="gemini-3-flash-preview"),
    memory_manager=memory_manager,
    enable_user_memory=True,
)

# First conversation — preferences extracted and stored
agent.print_response(
    "I'm interested in AI stocks. My risk tolerance is moderate.",
    user_id="investor@example.com",
)

# Later conversation — agent remembers
agent.print_response(
    "What stocks would you recommend for me?",
    user_id="investor@example.com",
)

enable_user_memory=True runs the MemoryManager in parallel with every run. Use enable_agentic_memory=True to let the agent decide when to store memories via tool calls. More efficient, doesn't run on every response.

Pattern 3: Learned Memory

Now let's add learned memory: insights that apply beyond just one user. The key is a custom tool that saves learnings to a knowledge base:

import json
from datetime import datetime, timezone

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.knowledge import Knowledge
from agno.models.google import Gemini
from agno.vectordb.chroma import ChromaDb

agent_db = SqliteDb(db_file="tmp/agents.db")

learnings_kb = Knowledge(
    name="Agent Learnings",
    vector_db=ChromaDb(
        name="learnings",
        persistent_client=True,
        search_type=SearchType.hybrid,
    ),
)

def save_learning(title: str, learning: str) -> str:
    """
    Save a reusable insight to the knowledge base.

    Args:
        title: Short descriptive title
        learning: The insight — specific and actionable
    """
    payload = {
        "title": title.strip(),
        "learning": learning.strip(),
        "saved_at": datetime.now(timezone.utc).isoformat(),
    }

    learnings_kb.add_content(
        name=payload["title"],
        text_content=json.dumps(payload),
    )

    return f"Saved: '{title}'"

agent = Agent(
    model=Gemini(id="gemini-3-flash-preview"),
    tools=[save_learning],
    knowledge=learnings_kb,
    search_knowledge=True,
    db=agent_db,
)

The agent now has two capabilities:

Search first — Before answering, it searches for relevant prior learnings
Save learnings — When it discovers something reusable, it saves it

But how do you prevent the agent from saving garbage?

Bonus: Human-in-the-Loop Gating

The quality of your knowledge base determines the quality of learning. Garbage in, garbage out.

The solution: the agent proposes learnings, but only saves with explicit user approval.

from agno.tools import tool

@tool(requires_confirmation=True)
def save_learning(title: str, learning: str) -> str:
    """Save a reusable insight. Requires user confirmation."""
    # ... same implementation

Handle the confirmation flow:

run_response = agent.run("Analyze NVDA and save any insights")

for requirement in run_response.active_requirements:
    if requirement.needs_confirmation:
        print(f"Tool: {requirement.tool_execution.tool_name}")
        print(f"Args: {requirement.tool_execution.tool_args}")

        if user_approves:
            requirement.confirm()
        else:
            requirement.reject()

run_response = agent.continue_run(
    run_id=run_response.run_id,
    requirements=run_response.requirements,
)

The agent proposes, the human gates. High-signal knowledge only.

5. Video demo

Here's a video demo that starts by showcasing user memory, then learned memory with user confirmation.

Your browser does not support the video tag.

5. What makes a good learning

A learning is worth saving if it's:

Specific: "Tech P/E ratios typically range 20-35x" not "P/E varies"
Actionable: Can be applied to future queries
Generalizable: Useful beyond this one conversation

Don't save: raw data, one-off facts, summaries, speculation.

Most queries should NOT produce a learning, and that's OK.

Where to store

Memory Type	Key	Agno Component
Session	`session_id`	`SqliteDb`, `PostgresDb`, `MongoDB`
User	`user_id`	`MemoryManager` + Database
Learned	`learning_id`	`Knowledge` + `ChromaDb`, `PgVector`, `Qdrant`, `Pinecone`

Avoiding bloat

The biggest mistake is storing too much. A bloated knowledge base hurts retrieval and makes the agent worse.

The upside: because learnings are stored explicitly (not in weights), they're auditable and reversible. Bad learning? Delete it. System immediately improves.

6. Get started

This blog comes with complete working code. Here are 12 cookbooks that take you from "what is an agent" to building agents with memory, knowledge, state, guardrails, and more. Link again for reference.

#	Cookbook	What You'll Learn
01	Tools	Give agents the ability to fetch real-time data
02	Storage	Persist conversations across runs
03	Knowledge	Load documents and search with hybrid retrieval
04	Custom Tools	Write your own tools, add self-learning
05	Structured Output	Return typed Pydantic objects
06	Typed I/O	Full type safety on input and output
07	Memory	Remember user preferences across sessions
08	State Management	Track and persist structured state
09	Multi-Agent Teams	Coordinate specialized agents
10	Workflows	Sequential pipelines with predictable data flow
11	Guardrails	Input validation, PII detection, prompt injection defense
12	Human in the Loop	Require confirmation before sensitive actions

Each builds on fundamentals, but you can jump to any one.

Setup

git clone https://github.com/agno-agi/agno.git
cd agno

uv venv .getting-started --python 3.12
source .getting-started/bin/activate

uv pip install -r cookbook/00_getting_started/requirements.txt

export GOOGLE_API_KEY=your-google-api-key

Run an example

Each cookbook is self-contained:

python cookbook/00_getting_started/agent_with_tools.py

Want a visual interface? Agent OS gives you a web UI for chatting with agents, exploring sessions, and monitoring traces:

python cookbook/00_getting_started/run.py

Then visit os.agno.com and add http://localhost:7777 as an endpoint.

Swapping models

These examples use Gemini 3 Flash by default — fast, reliable tool calling, cheap enough to experiment freely. But Agno is model-agnostic:

# Gemini (default)
from agno.models.google import Gemini
model = Gemini(id="gemini-3-flash-preview")

# OpenAI
from agno.models.openai import OpenAIChat
model = OpenAIChat(id="gpt-5.2")

# Anthropic
from agno.models.anthropic import Claude
model = Claude(id="claude-sonnet-4-5")

One line change. Everything else stays the same.

If you enjoyed reading this, star Agno on GitHub. It helps more than you'd think. Questions or feedback? Reach out on X.

Build Your Own Multi-Agent System

hi@ashpreetbedi.com (Ashpreet Bedi) — Thu, 29 Jan 2026 00:00:00 GMT

Instead of a hello world tutorial, let me show you how to build a live multi-agent system. We'll run it locally on Docker and deploy to production on Railway.

This is a production-grade system that includes:

Feature	Description
Learning	Agents remember and improve over time
Persistence	State, sessions, and memory backed by PostgreSQL
Agentic RAG	Knowledge retrieval that knows when and how to search
MCP Tools	Connect to external services via Model Context Protocol
Monitoring	Full visibility via the AgentOS control plane

You'll also learn how to extend it with your own agents.

5 minute read. Running locally in 5. Deployed to production in 20.

The Agents

We'll build three agents, each demonstrating a different pattern:

Pal - AI-powered second brain. Captures notes, bookmarks, people, meetings. Researches the web. Learns over time.
Knowledge Agent - Answers questions from a knowledge base.
MCP Agent - Connects to external services via MCP.

Each agent can be extended to fit your needs.

Run Locally (5 minutes)

Prerequisites

Install Docker Desktop
Get an OpenAI API key

Setup

Clone the repo and export your OpenAI API key:

git clone \
    https://github.com/agno-agi/agentos-railway-template.git \
    agentos-railway

cd agentos-railway

export OPENAI_API_KEY="sk-***"

Start the application (API + Database):

docker compose up -d --build

That's it. Your system is running. Here's how it looks:

Your browser does not support the video tag.

Connect to the UI

Open os.agno.com
Click Add OS → Local
Enter http://localhost:8000 as the URL

Now chat with Pal:

> Note: decided to use Postgres for the new project - better JSON support

> Research event sourcing patterns and save the key findings

> What do I know about event sourcing?

Deploy to Production (10 minutes)

I've made it easy to deploy to Railway - just login and run a script.

Prerequisites

Install the Railway CLI

Deploy

railway login

./scripts/railway_up.sh

The script provisions PostgreSQL, configures environment variables, and deploys your system. Give it a few minutes for the services to spin up.

Connect to the UI

Open os.agno.com
Click Add OS → Live
Enter your Railway domain

You now have a production multi-agent system. Watch it go live in ~2 mins:

Your browser does not support the video tag.

What's Included

Pal (Personal Agent that Learns)

Your AI-powered second brain. Captures notes, bookmarks, people, meetings. Researches the web and saves findings. Learns from errors so it doesn't repeat them.

I wrote more about Pal here: Building Pal: Personal Agent that Learns

Knowledge Agent (Agentic RAG)

Store any type of docs in a vector store, chat with it using Agentic RAG.

knowledge_agent = Agent(
    model=OpenAIResponses(id="gpt-5.2"),
    knowledge=knowledge,
    search_knowledge=True,
)

MCP Agent (MCP Tools)

Connects to external tools via the Model Context Protocol. Point it at any MCP server and it gets access to those tools.

mcp_agent = Agent(
    model=OpenAIResponses(id="gpt-5.2"),
    tools=[MCPTools(url="https://docs.agno.com/mcp")],
)

Create Your Own Agent

Now let's add a custom agent to the system. We'll build a research agent that uses the Exa MCP server.

Create agents/research_agent.py:

from agno.agent import Agent
from agno.models.openai import OpenAIResponses
from agno.tools.mcp import MCPTools

from db import get_postgres_db

# Exa MCP for research
EXA_MCP_URL = (
    f"https://mcp.exa.ai/mcp?tools="
    "web_search_exa,company_research_exa,people_search_exa"
)

research_agent = Agent(
    id="research-agent",
    name="Research Agent",
    model=OpenAIResponses(id="gpt-5.2"),
    db=get_postgres_db(),
    tools=[MCPTools(url=EXA_MCP_URL)],
    instructions="""\
You are a research agent. You help users find information about:
- Companies and startups
- People and their backgrounds
- Topics and trends

Be thorough but concise. Cite your sources.
""",
)

from agents.research_agent import research_agent

agent_os = AgentOS(
    agents=[pal, knowledge_agent, mcp_agent, research_agent],
)

Your agent is now part of the system. Chat with it:

Your browser does not support the video tag.

If the agent doesn't show up, press refresh on the UI (top right corner) or restart containers with docker compose restart.

Wrapping Up

You now have a live multi-agent system with:

Feature	Description
Learning	Agents that remember and improve over time
Persistence	PostgreSQL for storing agent sessions, state, and memory
Research	Web search, company lookup, people search via Exa
Monitoring	Full visibility via the AgentOS control plane
Extensibility	Add agents, tools, and integrations as needed

What's Next

Build more agents - Add specialized agents for your use case
Add tools - Extend your agents with 100+ toolkits
Go multi-agent - Create multi-agent teams and workflows
Go multi-channel - Expose your agents via Slack, Discord, WhatsApp
Build an AI product - From 2-person startups to Fortune 500 companies, AgentOS is the foundation for agentic products

The system is yours. You have a head start - make it count.

Learn More

Built with Agno. Give it a ⭐️

Scaling Agentic Software

hi@ashpreetbedi.com (Ashpreet Bedi) — Thu, 16 Apr 2026 00:00:00 GMT

What is the simplest architecture for running a multi-agent system at scale?

I want to deploy agents as a real service. Multi-user, RBAC, JWT-based auth. Sessions, memory, and knowledge backed by a database. Horizontally Scalable. Able to serve thousands of concurrent requests. The kind of product you'd actually ship to users.

Could the answer be: a FastAPI app and a Postgres database?

So I spent some time building one to find out. 14 agents, 11 multi-agent teams, 5 workflows. Hundreds of tools, approvals, evals, schedules. All running in a single FastAPI process against a single PostgreSQL database. It's open source: Demo AgentOS.

I'll walk through the architecture in this post. In the next one we'll dive into what breaks when you push it.

The Bar

"Scale" gets thrown around quite a bit. In this case, scale means breadth. The surface area of a real product. Every concern a CTO would actually need to address before shipping a product to users:

Multi-user and multi-tenancy. Every user gets their own sessions, memory, and context. The system isolates every resource an agent touches, across every user, on every request.

Note: Context bleeding is a data breach, not a bug.

Auth and RBAC. JWT verification, role-based access control, scoped permissions. This applies to the API layer, the agents, the tools they call, and the data they can access. Dev and production should have different security postures.

Real persistence. Sessions, memory, and knowledge stored in a database, with regular backups and data access policies. Everything needs to comply with user-data protection laws like GDPR and CCPA.

Serving requests at scale. The system should be able to handle thousands of concurrent requests. Streaming responses should be held open. Background work (memory extraction, summarization, learning) running alongside the primary model call. All of it competing for the same HTTP transports, connection pools, and database connections. The hard part is not serving one request. It is serving the thousandth one without stalling the ninth one.

Observability. Tracing every agent run, every tool call, every delegation in a multi-agent team. When something goes wrong at step 7 of a 12-step workflow, you need to see exactly what happened and why.

Governance. Layered authority over what agents can do. Some tools run freely. Some need user approval. Some need admin sign-off. Approval flows, audit trails, and the ability to pause execution mid-run.

Reliability and evals. Agents are testable software. You need smoke tests, tool call validation, LLM-judged accuracy, performance baselines. Without evals, every change is a guess.

If this is the bar, the question is: what's the simplest architecture that clears it?

The Architecture

One FastAPI process. One Postgres database. That's it.

The FastAPI app serves 14 agents, 11 multi-agent teams, 5 workflows using REST endpoints. Every request is a POST, every response is a server-sent event stream.

The database does more than you'd think. The Postgres database stores agent sessions, user memory, knowledge contents, learnings, schedules, and eval results. Pgvector handles embeddings for knowledge bases.

The Components

The 30+ components in the AgentOS showcase different agentic patterns.

Some showcase HITL patterns. The Helpdesk agent wraps three tools: one that requires operator confirmation before restarting a service, one that pauses for user input on ticket priority, one that executes outside the agent runtime. The Approvals agent uses Agno's @approval decorator for blocking approval gates and audit-trailed operations. Both agents pause execution mid-run and resume on approval.

Some showcase guardrails. The Helpdesk agent has three pre-hooks: OpenAI moderation, PII detection, prompt injection detection. It also has a post-hook that scans responses for secret patterns (API keys, connection strings, SSNs) and rewrites them before they leave the process. An audit log hook records every run for compliance.

Some showcase multi-agent teams. Pal is a personal knowledge agent with five specialists. Dash is a data analyst with an Analyst/Engineer split. Coda is a coding agent with five specialists including a Planner and a Triager. The Research and Investment teams each ship in four modes (coordinate, route, broadcast, tasks) so you can see how the same set of members produces different behavior under different coordination patterns.

Some showcase step-based workflows. Morning Brief gathers calendar, email, and news in parallel and synthesizes a briefing. AI Research runs four parallel researchers and synthesizes their findings. Content Pipeline does parallel research plus a loop that iterates until an editor approves. Support Triage classifies a ticket, routes it to a specialist, and escalates if severity is high.

Some showcase state management. Taskboard demonstrates session state with agentic state updates. Injector demonstrates dependency injection through RunContext. Compressor demonstrates tool result compression with a cheaper model.

Some showcase scheduling. Morning Brief runs every weekday at 8am ET. AI Research runs every day at 7am UTC. The Scheduler agent lets users create, list, disable, and delete schedules at runtime through natural language.

The point is not that you need all of these. The point is that a single FastAPI process can host them without the architecture getting complicated.

Governance as First-Class Infrastructure

Three layers of governance sit on top of every agent.

Pre-hooks run before the model sees the input. Moderation, PII detection, injection detection. If any hook raises, the request is rejected before a single token is generated.

Approval gates pause the run mid-execution. A tool decorated with requires_confirmation=True or @approval streams a RunPaused event to the client with the tool name and arguments. The client shows the user an approve/reject UI. On approval, the run resumes from where it paused. This works because the session state is durable (stored in db).

Post-hooks run on the output. The Helpdesk agent has an output guardrail that scans responses for secret patterns and rewrites them before they leave. Every run is audit-logged through a separate hook.

What's Not Here

No message queue. No worker pool. No separate vector database. No Redis. No microservices. No orchestrator service standing in front of the agents. No separate auth service.

Could you add them? Sure. Are they necessary to clear the bar I defined? Not yet. The point of this exercise is to find out where the simple architecture breaks, so the next decision (what to add) is grounded in actual load, not in speculation.

What's Next

Part 2 is what breaks when you scale this.

I'm going to load test it. Thousands of concurrent requests. Streaming responses held open. Background memory extraction competing with primary runs. Connection pools under pressure. I expect to find a few obvious bottlenecks and a couple of surprising ones.

Links:

Self Learning Research Agent That Tracks Consensus Over Time

hi@ashpreetbedi.com (Ashpreet Bedi) — Tue, 16 Dec 2025 00:00:00 GMT

In this post, we’ll build a self-learning research agent that does something more useful than one-off web searches. It captures the current consensus, compares it to past runs, explains what changed and why, and stores a clean snapshot so future runs get better.

No fine-tuning. No retraining. Just good system design.

Why research agents break down in practice
Research is about consensus, not answers
What is "self-learning"
Snapshot-based learning architecture
What we store in the knowledge base (and what we don’t)
End-to-end agent flow
Production Codebase (deployable anywhere)
Steps to run your own Self Learning Research Agent
Why this pattern works

1. Why research agents break down in practice

Most research agents are stateless.

You ask a question today and get a well-written answer. You ask the same question tomorrow and get another well-written answer, but totally disconnected from the first one.

What's missing:

No memory of prior conclusions
No notion of what changed
No way to tell if the answer is stabilizing or shifting

Research without memory is just search with formatting.

Humans don't work this way. We remember what we believed before and pay attention when new information contradicts it.

That's the missing layer.

2. Research is about consensus, not answers

A single answer is rarely the goal of research.

What we actually care about is:

what most credible sources agree on
where there is disagreement
how confident we should be

That's why our agent doesn't store prose. It stores structured consensus. Consensus is represented as a set of claims that are:

short and explicit
backed by sources
labeled with confidence
stable enough to diff over time

This structure is what makes comparison possible.

It also lays the foundation for reasoning about sources over time, including which sources tend to be reliable or volatile.

3. What is "self-learning"

Self-learning means the agent improves based on its own experience.

In this case, improvement comes from capturing snapshots of consensus over time and using those snapshots as context in future runs.

The agent does not:

retrain models
update weights
fine-tune embeddings

Instead, it learns by capturing experience as data and reusing it deliberately. This is what I refer to as poor-man’s continuous learning.

The model stays fixed. The system improves by accumulating validated snapshots of understanding.

4. Snapshot-based learning architecture

The system is built around a simple idea: append-only snapshots.

Each snapshot represents:

the question that was asked
the internet's consensus at that moment
the claims that define that consensus
the sources used to support it
a short report summary for semantic retrieval

Snapshots are never mutated. We only add new ones and compare.

Each stored snapshot includes:

question
created_at
report_summary (short, human-readable)
consensus_summary (1–2 sentences)
claims (structured and diffable)
sources
optional notes

This keeps the knowledge base compact, searchable, and stable over time.

5. What we store in the knowledge base (and what we don’t)

The biggest mistake we can make is storing too much.

We deliberately do not store:

full markdown reports
raw scraped content
long explanations

We do store:

concise summaries
structured claims
deduplicated source lists

Each claim looks like:

claim_id (stable slug)
claim (short statement)
confidence (Low | Medium | High)
source_urls

If you can't diff it, you shouldn't store it.

This keeps retrieval high-signal and comparisons reliable.

6. End-to-end agent flow

Here's what happens on every run:

Parallel research The agent uses parallel search tools to gather information across multiple source types.
Consensus extraction Findings are synthesized into 4–10 structured claims with confidence and citations.
Snapshot retrieval The agent searches the knowledge base for the most recent snapshot of a similar question.
Diff Current claims are compared to the previous snapshot:
- new or strengthened claims
- weakened or disputed claims
- removed claims
Each change includes a brief explanation and supporting sources.
Human-in-the-loop save The agent asks whether to save the new snapshot. Only explicit approval persists it.

This keeps learning controlled, auditable, and intentional.

7. Production Codebase (deployable anywhere)

I'm providing a production codebase for running our self-learning research agent, built using:

A FastAPI application for running our agents.
A Postgres database for storing sessions, memory and knowledge.

Here's the link to the repository containing the production codebase.

Here's the structure of the repository:

.
├── agents
│   ├── self_learning_research_agent.py
│   └── ... more agents
├── app
│   └── main.py
├── compose.yaml
├── db
├── Dockerfile
├── pyproject.toml
├── railway.json
├── README.md
├── teams
│   └── finance_team.py
└── workflows
    └── research_workflow.py

8. Steps to run your own Self Learning Research Agent

Clone the repo

git clone https://github.com/agno-agi/agentos-railway.git
cd agentos-railway

Configure API keys

We'll use OpenAI for the agent and Parallel Search for search tools. Please export the following environment variables:

export OPENAI_API_KEY="YOUR_API_KEY_HERE"
export PARALLEL_API_KEY="YOUR_API_KEY_HERE"

You can copy the example.env file and rename it to .env to get started.

Install Docker

We'll use docker to run the application locally and deploy it to Railway. Please install Docker Desktop if needed.

Run the application locally

Run the application using docker compose:

docker compose up --build -d

This command builds the Docker image and starts the application:

The FastAPI application, running on localhost:8000.
The PostgreSQL database for storing agent sessions, knowledge, and memories, accessible on localhost:5432.

Once started, you can:

View the FastAPI application at localhost:8000/docs.

Connect the AgentOS UI to the FastAPI application

Open the AgentOS UI
Login and add http://localhost:8000 as a new AgentOS. You can call it Local AgentOS (or any name you prefer).

Demo

Here's a demo of the Self Learning Research Agent in action.

Your browser does not support the video tag.

Stop the application

When you're done, stop the application using:

docker compose down

Deploy the application to Railway

To deploy the application to Railway, run the following commands:

Install Railway CLI:

brew install railway

railway login

Deploy the application:

./scripts/railway_up.sh

This command will:

Create a new Railway project.
Deploy a PgVector database service to your Railway project.
Build and deploy the docker image to your Railway project.
Set environment variables in your AgentOS service.
Create a new domain for your AgentOS service.

9. Why this pattern works

This approach generalizes far beyond traditional research, you can use it for:

market analysis
policy tracking
competitive intelligence
technical standards
internal decision logs

Anywhere beliefs evolve, snapshots beat stateless answers. By separating:

online reasoning
from offline learning
and storing only what matters

we get agents that feel more trustworthy, more explainable, and more useful over time.

Thank you for reading! I hope you found this useful. Feel free to reach out to me on X if you have any questions or feedback

Self Improving Text2Sql Agent with Dynamic Context and Continuous Learning

hi@ashpreetbedi.com (Ashpreet Bedi) — Mon, 15 Dec 2025 00:00:00 GMT

This post shows how to build a self-improving Text-to-SQL agent using dynamic context and "poor-man's continuous learning". We'll break the problem into two parts:

Text-to-SQL Agent (Online Path): answers questions by retrieving schema + query patterns from a knowledge base (dynamic context).
Continuous Learning (Offline Path): learns from successful runs and adds new entries to the knowledge base.

When the Agent finds a successful result, it stores it in its knowledge base for future use. This gives the text-to-sql agent a self-improving feedback loop, but keeps the online path stable.

Why Text-to-SQL fails in practice
What is "dynamic context"
What is "poor man's continuous learning" (and why it works)
Unified Agent Architecture
Knowledge Base Design (keep it structured)
Production Harness (deployable anywhere)
Steps to run your own Text-to-SQL Agent

1. Why Text-to-SQL fails in practice

Most Test-to-SQL agents fail in practice because they start from scratch every time, describing tables, columns, finding join keys. Repeating every mistake, every time.

Now compare this with how senior analysts or data engineers operate: do they start from scratch every time? No, they use tribal knowledge and experience and dig through past queries to find the right one. Once they find a useful query, they capture it in their knowledge base for future reference. Our text-to-sql agent works the same way.

I've found that most Text-to-SQL failures are not "model is dumb", they're "model is missing context and tribal knowledge" issues. Let's break down the common mistakes:

The model starts from scratch every time, describing tables, columns, finding join keys. Repeating every mistake, every time.
The model guesses column names, usage patterns, or doesn't know the right join keys.
The model misses domain definitions (active user, churn, ARR, etc.) or doesn't know the right business rules (eg: "status lives in orders.state, not orders.status").
The model is missing common gotchas (date in the wrong format, nulls in the wrong place, etc.).
The model re-invents queries that already exist in your organization's knowledge base.

The biggest improvement you can make to your text-to-sql agent is to provide it with the same tribal knowledge that human engineers have. This enables them to re-use queries that we know work and let the model search established usage patterns at runtime. Call it RAG, Agentic RAG, or Dynamic Context, it's the same thing: the model, at runtime, has access to the right context to generate the right SQL.

Our goal is straightforward:

Give our agent the tools to retrieve the right context at runtime (schemas, joins, past queries, metric definitions, gotchas).
Generate SQL grounded in well established usage patterns (no guessing and no re-inventing the wheel).
Validate the SQL (query is parseable, schema checks, etc.).
Run the SQL and "analyze" the results. Don't just give me the data, give me the insights.
Capture learnings so the next run is better (new join path, corrected column mapping, query template, metric definition).
Repeat.

2. What is "dynamic context"

Dynamic context is simply: the agent retrieves the relevant knowledge at query time, which enables it to generate SQL grounded in well established usage patterns. The context is dynamic because it changes based on the query, the data, and the user's intent.

Examples of what the agent can retrieve:

Table schemas and relationships
Common join keys and relationships
Known queries for common use cases
Metric definitions and business rules
Known gotchas ("status lives in orders.state, not orders.status")

If your KB contains a query for "weekly active users", your agent should retrieve it, not re-invent it.

3. What is "poor man's continuous learning" (and why it works)

By "poor man's continuous learning", I mean:

We do not update model weights.
We do update retrieval knowledge when we find a successful result.
The system improves by capturing experience as reusable artifacts.

Every good query becomes future context. Every mistake becomes a rule. Every clarification becomes shared knowledge.

Poor man's continuous learning works because it provides a pragmatic learning loop: stable online behavior, controlled improvements. The best part is that you can always explore the knowledge base manually and fix issues or mistakes, imaging updating model weights by hand.

4. Unified Agent Architecture

The systems is broken into 2 parts:

Text-to-SQL Agent: answers questions by retrieving schema + query patterns from a knowledge base (dynamic context).
Continuous Learning: learns from successful runs and adds new entries to the knowledge base.

Query Flow

User asks a question
Agent retrieves context from KB (hybrid search) using:
- question text
- detected entities (tables, columns, metrics)
- optional database introspection results
This knowledge augments the input with dynamic context:
- retrieved knowledge snippets
- rules and constraints (read-only, limit, etc.)
This knowledge guides the generation of SQL.
Agent executes the query in a safe environment.
Agent analyzes the results and returns the answer.
If the result is successful, the agent asks the user if they want to save the query to the knowledge base.
If the user agrees, the agent stores the query in the knowledge base.
If the user disagrees, the agent revists the query, update it and try again.

There are 2 improvments you can make to the learning path:

Run the continuous learning separately after every run of the text-to-sql agent. This way, the continuous learning is always up to date with the latest queries and results.
Add a regression harness to the continuous learning. This way, you can test the knowledge base before and after updates to ensure it's still working.

5. Knowledge Base Design (keep it structured)

We want our knowledge base to store 3 kinds of information:

Table information: this includes the table schema, column metadata, query rules , common gotchas (eg: date column contains a rule: "Use the TO_DATE function when filtering by date").
Sample queries: this include common query patterns and best practices. Along with how to retrieve common metrics and KPIs. There's no need to re-invent the wheel.
Business semantics and relationships: the layer that maps how your organization talks about data to how the database is structured.

The sample codebase I'm providing contains the following files (table information and common queries):

agents/sql/knowledge/
├── constructors_championship.json
├── drivers_championship.json
├── fastest_laps.json
├── race_results.json
├── race_wins.json
└── common_queries.sql

6. Production Harness

I'm providing a production-ready harness for our system, built using:

A FastAPI application for running our agents.
A Postgres database for storing sessions, memory and knowledge.

Here's the link to the repository containing the production codebase.

Here's the structure of the repository:

.
├── agents
│   ├── __init__.py
│   ├── sql
│   │   ├── __init__.py
│   │   ├── knowledge
│   │   ├── load_f1_data.py
│   │   ├── load_sql_knowledge.py
│   │   ├── sql_agent.py
│   │   └── test_questions.txt
│   └── ... more agents
├── app
│   ├── __init__.py
│   └── main.py
├── compose.yaml
├── db
│   └── ... database configuration
├── Dockerfile
├── pyproject.toml
├── railway.json
├── README.md
├── requirements.txt
├── scripts
│   ├── dev_setup.sh
│   ├── entrypoint.sh
│   ├── railway_up.sh
│   ├── format.sh
│   └── validate.sh
├── teams
│   └── finance_team.py
└── workflows
    └── research_workflow.py

7. Steps to run your own Text-to-SQL Agent

Clone the repo

git clone https://github.com/agno-agi/agentos-railway.git
cd agentos-railway

Configure API keys

We'll use OpenAI for the text-to-sql agent, (we also use Anthropic and Parallel Search for other agents in the service). Please export the following environment variables:

# Required
export OPENAI_API_KEY="YOUR_API_KEY_HERE"

# Optional
export ANTHROPIC_API_KEY="YOUR_API_KEY_HERE"
export PARALLEL_API_KEY="YOUR_API_KEY_HERE"

You can copy the example.env file and rename it to .env to get started.

Install Docker

We'll use docker to run the application locally and deploy it to Railway. Please install Docker Desktop if needed.

Run the application locally

Run the application using docker compose:

docker compose up --build -d

This command builds the Docker image and starts the application:

The FastAPI application, running on localhost:8000.
The PostgreSQL database for storing agent sessions, knowledge, and memories, accessible on localhost:5432.

Once started, you can:

View the FastAPI application at localhost:8000/docs.

Load data for the SQL Agent

To load the data for the SQL Agent, run:

docker exec -it agentos-railway-agent-os-1 python -m agents.sql.load_f1_data

To populate the knowledge base, run:

docker exec -it agentos-railway-agent-os-1 python -m agents.sql.load_sql_knowledge

Connect the AgentOS UI to the FastAPI application

Open the AgentOS UI
Login and add http://localhost:8000 as a new AgentOS. You can call it Local AgentOS (or any name you prefer).

Demo

Here's a demo of the Text-to-SQL Agent in action. Notice how I add a query to the knowledge base and the agent uses it to generate the SQL when i ask the same question again.

Your browser does not support the video tag.

Stop the application

When you're done, stop the application using:

docker compose down

Deploy the application to Railway

To deploy the application to Railway, run the following commands:

Install Railway CLI:

brew install railway

railway login

Deploy the application:

./scripts/railway_up.sh

This command will:

Create a new Railway project.
Deploy a PgVector database service to your Railway project.
Build and deploy the docker image to your Railway project.
Set environment variables in your AgentOS service.
Create a new domain for your AgentOS service.

Thank you for reading! I hope you found this useful. Feel free to reach out to me on X if you have any questions or feedback

Systems Engineering

hi@ashpreetbedi.com (Ashpreet Bedi) — Tue, 14 Apr 2026 00:00:00 GMT

The Key To Building Agentic Software That Works

In the early 1940s, Bell Labs was building the national telephone network, the most complex technical system in the world at the time. Millions of switches, cables, relays, and operators had to work together. The engineers discovered something that would become an 80-year-old lesson: you can't optimize a system by optimizing individual components. The behavior of the whole (call routing, reliability, capacity, cost) emerged from how the parts interacted. They needed a discipline focused on the interactions between components.

They called it systems engineering.

Agentic Software Is a Systems Engineering Problem

Coding agents have lowered the barrier to writing code, but they haven't lowered the requirements of production software.

Software engineering is, and has always been, systems engineering and agentic software is no different. If you're building agentic software, your system needs to bridge five layers:

1. Agent Engineering. Your agent or multi-agent logic and execution flow. Model, system instructions, tool configurations, handoffs, context management, observability. This is where you define what your agent does, how it runs, and how it responds. Your agent's behavior should be deterministic where possible and observable where it isn't.

2. Data Engineering. Your agent is only as good as the context it has access to, and context is just data under the hood.

Call it memory, storage, knowledge. Your Agent's data should be managed with data engineering principles. Well designed schemas, structured querying, databases for fast read/writes, object storage for long-term storage, and workflows that keep your knowledge and memory up to date. The patterns are decades old. Use them.

3. Security Engineering. Auth, RBAC, governance, data isolation, audit trails. Your agent's capabilities are defined by its tools, and those tools should be scoped with JWT-backed permissions. Read-only access IS NOT a prompt instruction, it's a tool configuration.

Actions should have approval tiers: reads run freely, writes need user approval, sensitive operations need admin sign-off. Most actions should be logged and queryable for the life of the product.

And please, isolate requests. One user's context bleeding into another's is a data breach, not a "bug". It has serious consequences and there are laws protecting user data.

4. Interface Engineering. How users and other agents reach your agent.

REST API, Slack, MCP server, terminal, Chat UI. In the old world, you had one API and one client. Now you have multiple surfaces, each with its own identity system. A Slack user ID is not your product's user ID. An MCP client authenticating as another agent is not a human user. Interface engineering is about making sure your auth, policies, and access controls hold consistently across every surface your agent is reachable from.

5. Infrastructure Engineering. How you run and scale your software. Containers, cloud deployment, horizontal scaling. Generally called DevOps.

The good news: 95% of this is identical to running any other service. Re-use existing patterns, they'll serve you well. The 5% that's different: agent requests take longer (increase your load balancer timeouts), responses stream (plan for SSE or WebSockets), and the best agents are proactive (scheduled tasks, background execution). None of this is new.

The key unlock for AI engineers is realizing that agentic software is just regular software, with the business logic replaced by agents, and interfaces going from request/response to streaming across multiple surfaces.

Systems engineering is the discipline of making these parts work together, and is the key to building agentic software that works.

When you look at your software from a systems perspective, the right decisions become obvious. You give your agent well-scoped tools, not unfettered bash access. You store sessions, memory, and knowledge in a database, not in files, so you can utilize decades of multi-tenant patterns.

When you design one layer in isolation, you inherit constraints that cascade through the rest of the system. When you design from the system's perspective, each layer reinforces the others.

Systems Engineering in Practice

I can't make a claim like this and not give you working code.

Dash is an open-source, self-learning data agent. You ask it questions in plain English, it writes SQL, runs it, and tells you what the numbers mean. Simple enough to clone and adapt. Real enough to demonstrate all five layers. Here's how it looks (2x speedup)

Your browser does not support the video tag.

Dash is live in many companies and works incredibly well. The difference is the system behind it. Here's how each layer works.

Agent Engineering

Dash is a team of three agents. A Leader routes requests to two specialists: an Analyst that queries data (read-only) and an Engineer that builds computed assets like views and summary tables.

Each specialist gets similar tools, but wired up for different purposes. The Analyst's SQL tools connect to a read-only database engine. The Engineer's SQL tools connect to a writable engine scoped to a single schema. Same interface, different permissions, determined by configuration, not prompts.

Instructions are assembled at runtime from table metadata and business rules stored as structured files.

Interface Engineering

One system, multiple surfaces.

Dash serves a REST API, a Slack bot, a web UI, and a CLI. Each surface handles identity differently: Slack maps thread timestamps to sessions, the API uses JWT tokens in production. But all four hit the same agents, same tools, same knowledge. Adding a new interface does not require rebuilding the agent logic.

Your auth and access controls need to hold across every surface, because the agent doesn't know which one it's being called from. Here's dash being used in slack.

Your browser does not support the video tag.

Data Engineering

Six layers of context, and tools for learning.

Raw LLMs writing SQL hit a wall fast: schemas lack meaning, types are misleading, tribal knowledge is missing, there's no way to learn from mistakes. Dash solves this with six layers of grounded context:

Table metadata (schema, columns, relationships)
Human annotations (metrics, definitions, business rules)
Query patterns (SQL that is known to work)
Institutional knowledge (docs, wikis)
Learnings (error patterns and discovered fixes)
Runtime context (live schema inspection)

These layers feed two systems.

The first is curated knowledge: table schemas, validated queries, and business rules loaded into PostgreSQL.
The second is discovered learnings: error patterns and fixes that the agent saves automatically when it hits problems and recalls on future queries.

The learning loop is simple: the agent runs a query, gets a type error, diagnoses the fix, saves it. Next time it sees a similar column, it gets it right the first time. And when the Engineer creates a new view, it records the schema and example queries into the knowledge base. The Analyst discovers it on the next search and starts using it.

Query 100 is better than query 1, not because the model improved, but because the data layer got better.

Security Engineering

Enforced by the system, not the prompt.

Production auth uses RBAC with JWT verification. Every query is scoped to user_id. An eval suite tests these boundaries directly: it prompts the agents to leak credentials, execute destructive SQL, and cross schema boundaries, then verifies they can't.

Security is a system property tested across layers.

The Analyst's read-only access is a PostgreSQL connection parameter. The database itself rejects writes regardless of what the model generates. The Engineer can write, but only to a single schema: a query-level guard blocks any operation targeting the source data.

Infrastructure Engineering

Boring on purpose.

Standard Python container. Docker Compose for local development. One-command cloud deployment. Streaming via SSE through a standard ASGI server. The 95% that's identical to any other service is identical. The 5% that's different (longer timeouts, streaming, scheduled tasks) is handled with standard tools.

You can clone it, run docker compose up, and have the entire system running in minutes. One command, five layers, a working product.

# Clone the repo
git clone https://github.com/agno-agi/dash.git

cd dash

# Set your keys
cp example.env .env
# Edit .env and add your model provider key

# Start the system
docker compose up -d --build

TLDR

Agentic software is just software. The agent replaces business logic. Everything else is systems engineering. Five layers: agent, data, security, interface, infrastructure. Each layer affects the others. Design them together and the system compounds. Design them in isolation and you spend your time patching around constraints that shouldn't exist. We walk through all five with Dash, a real open-source data agent you can run yourself.

Links:

WTF are Agents?

hi@ashpreetbedi.com (Ashpreet Bedi) — Fri, 24 Oct 2025 00:00:00 GMT

Most people overcomplicate Agents.

Are they workflows? are they graphs? are they LLMs in a loop or just expensive while-loops? Are they deterministic, autonomous, or confused? Some say if you whisper "agent" three times, a VC appears with a term sheet.

How about we cut through the noise and understand what an Agent is by mapping out how they work. Let's demystify it — without the hype.

What is an Agent?

Regular programs execute a fixed set of instructions, written as code, in a predetermined order. If you write a program to add two numbers, that's exactly what it will do, every time. It won't add three, or four, or decide to do something else. The outcome is always the same because the logic is hardcoded.

Agents, on the other hand, are AI programs where a language model decides the flow of execution. You give it instructions, a set of tools, and the model decides what to do. If you give an Agent tools to add numbers, it can add two, three, or ten. If you also give it tools to subtract, multiply, and divide, it can perform any combination of operations — without you writing that logic explicitly.

If that explanation sounded abstract, that's because it is. Let's make sense of it by walking through what happens when you run an Agent:

The Agent first builds the context for the model: system messages, user messages, adds chat history, memory, knowledge, state.
It sends that context to the model (the execution loop begins).
The model replies with a message, a tool call, or both.
If a tool is called, the Agent executes it and returns the results to the model. This is what I think makes a program "agentic".
The loop continues until the model produces a final message.
The Agent returns that response to the caller.

That's it, this is an Agent. What'll be different is the context, the tools, and the model's reasoning, but the core remains the same.

We're moving from deterministic execution to reasoning-based execution — from code that follows instructions to software that decides what to do. Will it do it well? We'll find out.

Minimal Example

Let's build a simple agent to demo how it works, we'll add a few capabilities to make it more interesting:

A database to store and maintain conversation history
Tools via MCP that it can call to answer questions
Respond in markdown so it looks pretty

We'll also turn it into a FastAPI app so we can deploy it as a service. You can read the full instructions here.

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.anthropic import Claude
from agno.os import AgentOS
from agno.tools.mcp import MCPTools

# ************* Create Agent *************
agno_agent = Agent(
    name="Agno Agent",
    model=Claude(id="claude-sonnet-4-5"),
    db=SqliteDb(db_file="agno.db"),
    tools=[MCPTools(url="https://docs.agno.com/mcp", transport="streamable-http")],
    add_history_to_context=True,
    markdown=True,
)

# ************* Create AgentOS *************
agent_os = AgentOS(agents=[agno_agent])
app = agent_os.get_app()

You can run this Agent using fastapi dev agno_agent.py and chat with it on the AgentOS UI. Here's how it looks:

Your browser does not support the video tag.

Deploy your FastAPI app to your cloud of choice, and you're live!

Are we done?

Not even close. The hard part isn't building the Agent, its building the system that runs these Agents in production, and building a product around it with a great UX (or rather, AX — Agent Experience).

Ensuring reliability, durability, and a smooth experience across thousands of concurrent sessions is where the real engineering happens. These are long-running processes that demand isolated state management, persistent storage, and strong fault tolerance.

Here's what you'll need to consider when building Agents:

Runtime architecture: how agents are orchestrated, manage state, and handle execution loops.
Memory systems: how agents retain and manage context, session history, memory, knowledge and culture.
Tooling integration: how agents connect to APIs, databases, or internal functions (MCPs are popular here).
Safety & Security: how to ensure data, application and user-level security.
Evaluation & performance: measuring usefulness, latency, cost, and reliability of the agentic system.

Each of these is a discipline of its own, with entire startups (sometimes dozens) dedicated to solving. But stitching it all together into a single, cohesive system is still a massive pain.

That's where Agno comes in.

What is Agno?

Agno is a multi-agent framework, runtime, and control plane. It solves the 5 problems mentioned above via 3 tightly coupled components:

Framework for building Agents, Multi-Agent Teams and Workflows. It comes with an incredibly rich set of features like persistent storage, memory management, knowledge retrieval, 100+ toolkits, guardrails, dependency injection, dynamic context management, human in the loop, and much, much more.
Pre-built FastAPI Runtime for deploying multi-agent systems. This runtime, called AgentOS, exposes pre-built endpoints you can build your product on top of. It handles concurrency, state management, and error recovery out of the box — plus extras like initializing MCP connections via lifecycle hooks and securing every request with a security-key.
Control Plane for testing, monitoring, debugging and evaluating multi-agent systems. This is a web interface that allows you to manage your multi-agent systems in real-time. It's a powerful tool that helps you understand what your agents are doing, and why.

If you're building Agents, give Agno a try:

GitHub: agno.link/gh
Documentation: agno.link/docs
Website: agno.com

Agents aren't magic. They're just a new kind of software. Once you understand that, everything else falls into place.

Agent Engineering is just Software Engineering

Ashpreet Bedi

Agent Engineering 101

What is Agent Engineering?

How Agno helps with Agent Engineering?

Minimal Example

Summary: The Layers of Agent Engineering

Designed for Agent Engineering

Want to build with Agno?

Agent Security 101

Transactional data ≠ Telemetry

1. Give your agents a database.

2. Store all transactions in that database.

3. Keep data within your system (and avoid duplication).

4. Want a UI? No problem.

5. Finally, stop paying for egress and retention.

Why Agno?

Agentic Culture

Why Culture?

Introducing Agentic Culture

How It Works

What You Can Do With It

Examples

Future Work

Explore & Build

Agentic Software Engineering

Build. Serve. Connect.

The 6 Pillars of Agentic Software

From Theory to Practice

Governance & Elicitation

Agents are distributed systems

Becoming AI-first

What "AI-first" really means

From exploration to execution

Want to build with Agno?

Dash: The Data Agent Every Company Needs

What is Dash?

How It Works

1. Context is everything

2. Self-learning loop

3. Three agents, two schemas

The part nobody else has

The full loop

Build Your Own

Quick Start

Connect to the Web UI

Connect to Slack

Adding Your Own Data

1. Load your tables into the public schema

2. Add table knowledge

3. Add validated queries

4. Add business rules

5. Load knowledge

Scheduled Tasks

Run Evals

Deploy to Production

What's Next

TLDR

Dash: Self-learning data agent

The 6 Layers of Context

The Self-Learning Loop

Build your own

Connect to the UI

Run evals

Closing thoughts

Dash is my attempt to make that accessible to everyone.

Learn More

Dynamic Software

Software is dead, long live Software

Assumptions Dynamic Software breaks

A new category needs a new runtime

The next decade

Evals Don't Give You a Working Product

The Pitch vs. The Reality

What Evals Don't Test

The Trap: Evals Too Early

What Evals Are Actually Good For

What Actually Gets You to Production

The Questions That Actually Matter

The Path Forward

Want to build with Agno?

1. Load your tables into the `public` schema