<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Ashpreet Bedi</title>
        <link>https://ashpreetbedi.com</link>
        <description>Ashpreet's blog</description>
        <lastBuildDate>Thu, 11 Jun 2026 20:13:04 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Ashpreet Bedi</title>
            <url>https://ashpreetbedi.com/favicon.ico</url>
            <link>https://ashpreetbedi.com</link>
        </image>
        <copyright>All rights reserved 2026</copyright>
        <item>
            <title><![CDATA[Image search using text classification]]></title>
            <link>https://ashpreetbedi.com/image-search-using-text-classification</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/image-search-using-text-classification</guid>
            <pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Agentic data labeling is one of those workhorse use-cases that keeps on delivering unexpected results.</p>
<p>In this post, I'll show you how to build a working image search engine using agents that label images using search terms and then running search on those descriptions instead of the images.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/image-search-5-21.mp4">Your browser does not support the video tag.</video>
<h2>Why?</h2>
<p>Image search has generally been a HARD problem. CLIP, multimodal vectors, fine-tuned encoders. So many use cases are blocked because of the complexity whereas the truth is that not every everyuse-case needs google photos like infrastructure.</p>
<p>Some just need a quick and dirty image search engine and now that's possible in about 100 lines of code.</p>
<h2>The old default</h2>
<p>For years, the assumed path for "search images by natural language" looked like:</p>
<ol>
<li>Run every image through a multimodal encoder (CLIP and friends) to get a vector.</li>
<li>Run the query string through the same encoder to get a vector.</li>
<li>Nearest-neighbor in shared embedding space.</li>
</ol>
<p>It's the right approach, but comes bearing costs. Swap encoders and you re-embed your whole library. The vectors are opaque, can't read them, can't grep them, can't explain a miss. And one-word queries like "car" or "drink" tend to land in fuzzy regions of the latent space where the wrong things are also nearby.</p>
<h2>The new approach - classify using vision models</h2>
<p>Gemini 3.5 flash turns our images into a structured description that reads like a search query a human would type. The model already knows that a yellow NYC taxi is also a car, a vehicle, a piece of transportation, and a thing that lives in Manhattan.</p>
<p>That's the signal we were trying to get out of the embedding space in the first place.</p>
<p>Once we have the description, we don't need image embeddings. We need text embeddings, which we've known how to do for a decade. And you get full-text search for free: stemming, prefix match.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>The cheap version of image search might be the right version for most use cases.</p></div></div></div></blockquote>
<h2>The labeling schema</h2>
<p>Update the fields to be signals you are looking to search over. For this demo we use:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">class</span> <span class="token class-name">ImageDescription</span><span class="token punctuation">(</span>BaseModel<span class="token punctuation">)</span><span class="token punctuation">:</span>
    caption<span class="token punctuation">:</span> <span class="token builtin">str</span>         <span class="token comment"># one or two sentences, written like a search query</span>
    subjects<span class="token punctuation">:</span> <span class="token builtin">list</span><span class="token punctuation">[</span><span class="token builtin">str</span><span class="token punctuation">]</span>  <span class="token comment"># 1-5 noun phrases, specific + generic</span>
    scene<span class="token punctuation">:</span> <span class="token builtin">str</span>           <span class="token comment"># short noun phrase: "urban street at night"</span>
    visual_style<span class="token punctuation">:</span> <span class="token builtin">str</span>    <span class="token comment"># "soft morning light", "vibrant macro"</span>
    tags<span class="token punctuation">:</span> <span class="token builtin">list</span><span class="token punctuation">[</span><span class="token builtin">str</span><span class="token punctuation">]</span>      <span class="token comment"># 12-20 lowercase keywords</span>
</code></pre>
<p>The agent returns this as structured output. We flatten it into one string and embed that string. The structured fields stay on the side as metadata for the UI to render.</p>
<h2>Search becomes free</h2>
<p>Once your index is text, everything you've ever wanted from search is in the box:</p>
<ul>
<li><strong>Cosine similarity</strong> over the embedded descriptions for semantic recall. "cozy interior" finds a cafe shot tagged "warm", "intimate", "wooden".</li>
<li><strong>PostgreSQL full-text search</strong> (<code>to_tsvector</code> + <code>websearch_to_tsquery</code>) with stemming, so "car" matches "cars" without dragging in "carnivore", and prefix match so "mount" hits "mountain".</li>
<li><strong>Hybrid fusion,</strong> pgvector blends the two scores into one ranked list. Out of the box.</li>
</ul>
<p>The labeling step is the only part of the pipeline that wasn't great a year ago. Everything downstream is boring infrastructure that's been working for years.</p>
<h2>Run it yourself</h2>
<p>As always I come bearing code.</p>
<p>The full cookbook lives <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/tree/main/cookbook/data_labeling/image_search">here</a>. Clone, <code>pip install -r requirements.txt</code>, set <code>GOOGLE_API_KEY</code>, point at a pgvector, and you have a working image search engine in a couple of minutes.</p>
<p>Swap the default URLs in <code>settings.py</code> for your own S3 bucket and you get your own image search engine.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/image-search-5-21.mp4">Your browser does not support the video tag.</video>
<p>Thanks for reading!</p>
<p>Ashpreet - <em>built with 🧡 using <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a></em></p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Agent Platform That Builds Itself]]></title>
            <link>https://ashpreetbedi.com/agent-platform-build-itself</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/agent-platform-build-itself</guid>
            <pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Every team building agents ends up building an agent platform from scratch.</p>
<p>Coding agents can do all of that work now. Today I'll share how to build an agent platform built, managed and improved entirely by coding agents.</p>
<p>The entire agent development lifecycle is managed using five prompts:</p>
<ul>
<li><strong>Create.</strong> Scaffolds a new agent.</li>
<li><strong>Improve.</strong> Hardens an existing agent against its own spec.</li>
<li><strong>Extend.</strong> Adds new capabilities to an existing agent.</li>
<li><strong>Hill Climb.</strong> Runs the eval suite, diagnoses failures, fixes what's in scope.</li>
<li><strong>Review.</strong> Sweeps the repo for drift between docs, code, and config.</li>
</ul>
<h2>What is an agent platform</h2>
<p>Let's say you built an agent in your favorite framework. How do you take it live? How do you host it in the cloud, send requests to it, secure it?</p>
<p>Your agent platform is the service responsible for running your agents. It takes requests, runs the agent and streams the responses. It collects data and metrics, manages security by preventing unauthorized access, and stops one agent from accessing or polluting the data of another.</p>
<p>If you think of agents as mini-applications, it becomes clear that they need a system to run on, like an OS. Your agent platform is that OS.</p>
<h2>What we're building</h2>
<p>Today we'll build an agent platform that you can run locally using docker, your own cloud, or on Railway with 1 command. The platform has five parts:</p>
<ol>
<li><strong>Runtime:</strong> the service that runs your agents. It handles requests, runs the agent loop, streams responses, writes to storage, handles auth.</li>
<li><strong>Storage:</strong> the database where our data lives: agent sessions, memory, knowledge, traces and eval history.</li>
<li><strong>Connectors:</strong> tools for agents to connect with external systems via MCP, API, or CLI. Having them in one place is a big win for security.</li>
<li><strong>Interfaces:</strong> Slack, Discord, Telegram, custom UIs. One place to resolve identity across surfaces, so the same person is the same <code>user_id</code> whether they ping you in Slack or hit the web app.</li>
<li><strong>Infrastructure:</strong> where everything runs. We'll use Docker for local and Railway for production. You're free to run production anywhere.</li>
</ol>
<p>Once it's running, you should be able to ship a new agent in ~10 minutes without writing any code. I know this is a crazy claim so let's give it a shot.</p>
<h2>Let's get started</h2>
<p>I'm going to share a foundational codebase that you can build upon.</p>
<p>Clone, configure, and run: <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agent-platform-railway">agent-platform-railway</a></p>
<pre class="language-bash"><code class="language-bash"><span class="token comment"># Clone the agent-platform template</span>
<span class="token function">git</span> clone https://github.com/agno-agi/agent-platform-railway.git agent-platform
<span class="token builtin class-name">cd</span> agent-platform

<span class="token comment"># Configure your environment</span>
<span class="token comment"># Recommended: copy the example env file and add the key there</span>
<span class="token function">cp</span> example.env .env
<span class="token comment"># Edit .env and add your OPENAI_API_KEY</span>

<span class="token comment"># Run your platform: 1 FastAPI server and 1 Postgres database</span>
<span class="token function">docker</span> compose up -d --build
</code></pre>
<p>This brings up two containers: a FastAPI server and a Postgres database.</p>
<p>Now let's give our platform a UI.</p>
<ol>
<li>Head to <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> and sign in.</li>
<li>Connect to your local OS at <code>http://localhost:8000</code>.</li>
</ol>
<p>You should see something like:</p>
<img alt="Local agent platform connected to AgentOS UI" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Flocal-agent-platform.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Flocal-agent-platform.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Flocal-agent-platform.png&amp;w=1920&amp;q=75">
<h2>Agent Development Lifecycle</h2>
<p>Because everything is in one place, Claude Code can manage my entire agent development lifecycle.</p>
<h3>1. Create an agent</h3>
<p>To create a new agent, I open Claude Code and type:</p>
<blockquote>
<p>Run <code>create-new-agent.md</code> in a new branch.</p>
</blockquote>
<p>Claude starts by asking a few questions about what the agent should do, which tools it needs. It then searches the Agno docs via MCP for the right toolkit, generates the agent file, registers it in <code>app/main.py</code>, restarts the container, and smoke-tests via cURL. 5-10 minutes from prompt to agent.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/5-11-create-new-agent.mp4">Your browser does not support the video tag.</video>
<h3>2. Improve an agent</h3>
<p>To improve an existing agent, I type:</p>
<blockquote>
<p>Run <code>improve-agent.md</code> on code-search agent.</p>
</blockquote>
<p>Claude reads the agent's <code>INSTRUCTIONS</code> and derives 8-12 probes from them. Some golden-path. Some edge cases. Some tool-selection. A couple of adversarial ones thrown in: prompt injections, malformed input, attempts to pull the agent off-purpose.</p>
<p>It runs each probe against the live container via cURL. Reads the response. Reads the tool calls from the container logs. Judges PASS or FAIL against what the <code>INSTRUCTIONS</code> actually promise.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/5-11-improve-agent.mp4">Your browser does not support the video tag.</video>
<p>For every failure it picks a lever. Tighten a rule. Add a rule. Swap a tool. Bump <code>num_history_runs</code>. Whatever fits the failure mode. It edits <code>agents/&lt;slug&gt;.py</code>, hot-reloads, and re-runs only the probes that failed.</p>
<p>Zero input from me beyond kicking off the task. This used to take a day of manually clicking things around and now it's fully automated.</p>
<h3>3. Extend an agent</h3>
<p>To add capabilities to an existing agent, I type:</p>
<blockquote>
<p>Run <code>extend-agent.md</code> on code-search agent.</p>
</blockquote>
<p>Extend runs with my guidance. I describe a change: add a tool, refine a prompt, fix a bug. Claude executes. The Agno docs MCP is loaded so toolkit research is grounded in the real API.</p>
<h3>4. Hill Climb</h3>
<p>Over time we collect a lot of evals, and it would be a shame to fix failures manually. I simply type:</p>
<blockquote>
<p>Run <code>eval-and-improve.md</code>.</p>
</blockquote>
<p>Hill Climb runs the eval suite, diagnoses every failure, and fixes what's in scope.</p>
<h3>5. Review</h3>
<p>Because the repo is managed primarily by coding agents, it moves <em>fast</em>. To bring everything up to speed, I type:</p>
<blockquote>
<p>Run <code>review-and-improve.md</code>.</p>
</blockquote>
<p>Claude sweeps the whole repo for drift between docs, code, and config. Fixes what it can. Best run before a release or after a refactor.</p>
<p>Drift between docs and code has always been a tax on production software. Now it costs nothing.</p>
<h2>Run in production</h2>
<p>When you're ready to ship, the codebase comes with deploy-to-Railway scripts:</p>
<pre class="language-bash"><code class="language-bash">./scripts/railway/up.sh
</code></pre>
<p>This provisions Postgres and the app service on the same private network.</p>
<p>Read the full Railway guide in the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agent-platform-railway">README</a>.</p>
<h2>Wrapping up</h2>
<p>Congratulations. If you made it this far, you have an auto-improving agent platform running securely in your cloud.</p>
<p>Technical users on your team can create and deploy agents using Claude Code. Non-technical users can use the no-code UI.</p>
<p>Sessions, traces, and knowledge live in your database. Your infrastructure is gated behind JWT-based RBAC and API keys are managed in one place.</p>
<p>The agents you ship today are the smallest part of what you've built. The platform underneath them, and the iteration loop it enables, is the thing that matters.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Auto-Improving Software]]></title>
            <link>https://ashpreetbedi.com/auto-improving-software</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/auto-improving-software</guid>
            <pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Coding agents have changed how we build software. Now they're changing how we improve it. Today I'll share an agent platform that coding agents build, run, and improve on their own.</p>
<p>The entire platform is managed using five prompts:</p>
<ul>
<li><strong>Create.</strong> Scaffolds a new agent.</li>
<li><strong>Improve.</strong> Hardens an existing agent against its own spec.</li>
<li><strong>Extend.</strong> Adds new capabilities to an existing agent.</li>
<li><strong>Hill Climb.</strong> Runs the eval suite, diagnoses failures, fixes what's in scope.</li>
<li><strong>Review.</strong> Sweeps the repo for drift between docs, code, and config.</li>
</ul>
<p>The "Improve → Hill Climb" loop recursively improves my agents with minimal oversight. It's hard to imagine doing this manually.</p>
<p>FYI, this auto-improvement loop is only possible because the environment is designed for it. Agent code, traces, logs, eval suite, and the live software all live in one place so a coding agent can rip end-to-end.</p>
<h2>It works because we control the stack</h2>
<p>Most software can't auto-improve because its inputs and outputs are scattered across tools. To run the auto-improvement loop, a coding agent has to piece together data from three different tools, each behind its own auth, with its own way of doing things.</p>
<p>Theoretically possible. In practice, too much friction.</p>
<p>My codebase is specifically designed for auto-improvement. For example, claude code can test an agent, then PASS or FAIL by reading the sessions, traces, and logs. If the agent fails, it edits the agent and runs it again.</p>
<p>Three things make this possible:</p>
<ol>
<li><strong>Every action is exposed as an API.</strong> Running an agent, reading a session, running an eval. Every key action can be run using cURL or bash.</li>
<li><strong>Data is colocated.</strong> Sessions and traces live in our Postgres database. A coding agent can trigger a run and read what came out without leaving its environment.</li>
<li><strong>Logs over everything.</strong> The entire platform runs locally on Docker. The coding agent reads live logs and makes updates as needed. The test -&gt; review loop is ~5s. Logs are the real-time feedback loop that unlocks everything.</li>
</ol>
<p>Agent platforms are the first category of software where the actions, data, and the iteration tool all sit close enough that a coding agent can test end-to-end, make code changes, and test again until the agent improves. Meaning the platform that hosts the loop is the first thing the loop improves.</p>
<h2>Agent Development Lifecycle</h2>
<p>Next I'll show you how Claude Code runs my agent platform.</p>
<h3>1. Create an agent</h3>
<p>To create a new agent, I open Claude Code and type:</p>
<blockquote>
<p>Run <code>create-new-agent.md</code> in a new branch.</p>
</blockquote>
<p>Claude starts by asking a few questions about what the agent should do, which tools it needs. It then searches the Agno docs via MCP for the right toolkit, generates the agent file, registers it in <code>app/main.py</code>, restarts the container, and smoke-tests via cURL. 5-10 minutes from prompt to agent.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/5-11-create-new-agent.mp4">Your browser does not support the video tag.</video>
<p>Because the platform takes care of everything, I'm building agents I never would have bothered with before. An agent that summarizes overnight Slack messages, an agent that drafts my weekly update, an agent that highlights important issues in the repo. None of these would have survived a multi-day project. All of them fit into a coffee break.</p>
<h3>2. Improve an agent</h3>
<p>To improve an existing agent, I type:</p>
<blockquote>
<p>Run <code>improve-agent.md</code> on code-search agent.</p>
</blockquote>
<p>Claude reads the agent's <code>INSTRUCTIONS</code> and derives 8-12 probes from them. Some golden-path. Some edge cases. Some tool-selection. A couple of adversarial ones thrown in: prompt injections, malformed input, attempts to pull the agent off-purpose.</p>
<p>It runs each probe against the live container via cURL. Reads the response. Reads the tool calls from the container logs. Judges PASS or FAIL against what the <code>INSTRUCTIONS</code> actually promise.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/5-11-improve-agent.mp4">Your browser does not support the video tag.</video>
<p>For every failure it picks a lever. Tighten a rule. Add a rule. Swap a tool. Bump <code>num_history_runs</code>. Whatever fits the failure mode. It edits <code>agents/&lt;slug&gt;.py</code>, hot-reloads, and re-runs only the probes that failed.</p>
<p>Then it iterates. Capped at five rounds. Stops earlier if everything passes.</p>
<p>Zero input from me beyond kicking off the task. This used to take a day of manually clicking things around and now its fully automated.</p>
<h3>3. Extend an agent</h3>
<p>To add capabilities to an existing agent, I type:</p>
<blockquote>
<p>Run <code>extend-agent.md</code> on code-search agent.</p>
</blockquote>
<p>Extend runs with me in the driver's seat. I describe a change: add a tool, refine a prompt, fix a bug. Claude executes. The Agno docs MCP is loaded so toolkit research is grounded in the real API.</p>
<p>Claude makes the changes. Runs smoke-tests. Each iteration is one small, verified step. Changes stay surgical and get tested in isolation.</p>
<h3>4. Hill Climb</h3>
<p>Over time we collect a lot of evals, and it would be a shame to fix failures manually. I simply type:</p>
<blockquote>
<p>Run <code>eval-and-improve.md</code>.</p>
</blockquote>
<p>Hill Climb runs the eval suite, diagnoses every failure, and fixes what's in scope. Failure types map to fix locations: missing rule in <code>INSTRUCTIONS</code>, hallucination, wrong tool fired, overspecified rubric. For each failure Claude picks the right lever, edits, and re-runs only the failing case. Once everything is green it re-runs the full suite to catch regressions.</p>
<p>The eval suite is two files. <code>evals/cases.py</code> declares the cases. Each case is one input plus a rubric (what a correct response looks like) and optionally an expected tool call. Built on Agno's <code>AgentAsJudgeEval</code> and <code>ReliabilityEval</code>.</p>
<p>Improve catches out-of-distribution failures. Hill Climb makes sure in-distribution cases continue to pass. The two work very well together.</p>
<h3>5. Review</h3>
<p>Because the repo is managed primarily by coding agents, it moves <em>fast</em>. To bring everything up to speed, I type:</p>
<blockquote>
<p>Run <code>review-and-improve.md</code>.</p>
</blockquote>
<p>Claude sweeps the whole repo for drift between docs, code, and config. Every agent file on disk should be registered in <code>app/main.py</code>. Every env var the code reads should be in <code>example.env</code> and the AGENTS.md. Every path in a markdown doc should still exist. Every script should do what's claimed.</p>
<p>Mechanical drift gets auto-fixed in place: a renamed file, a missing entry in <code>example.env</code>, a new agent missing from the architecture diagram. Anything bigger gets flagged with a recommended next step.</p>
<p>Best run before a release or after a refactor. The kind of work that's tedious for a human and trivial for a coding agent that can read every file in the repo.</p>
<p>Drift between docs and code has always been a tax on production software. Now it costs nothing.</p>
<h2>Why Agent Platforms?</h2>
<p>Agent platforms are the perfect testing ground for this pattern.</p>
<ol>
<li><strong>Greenfield.</strong> Agent platforms are relatively new and can be designed for coding agents from the get go.</li>
<li><strong>Workflow is clear.</strong> We know how to improve an agent: run it, read the logs, grade the response, edit, run again.</li>
<li><strong>The loop is actually useful.</strong> For regular software, optimizing an API endpoint doesn't really make sense. For agents, each round of improvement is real, measurable, and adds value.</li>
</ol>
<p>Set the platform up right and you can build any agent on top of it: use the create workflow to go from idea to agent, the improve workflow to harden the agent, the extend workflow to add new capabilities, lock them up using evals and then hill-climb against them.</p>
<p>Keep the entire repo in sync using the review-and-improve workflow.</p>
<p>It's almost impossible to do this by hand.</p>
<h3>My Auto Improving Agent Platform</h3>
<p>Here's a link to my auto-improving agent platform: <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agent-platform-railway">agent-platform-railway</a>.</p>
<p>It's a starter codebase for an agent platform you can run locally using docker or on Railway. The prompts are in the <code>docs/</code> folder. Clone, configure, and you're running agents in 10 minutes.</p>
<p>Follow the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agent-platform-railway">README</a> for the full setup guide, and <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com">Agno Docs</a> for reference.</p>
<h2>Auto improving software</h2>
<p>I've been running this loop for a few weeks and it continues to surprise me.</p>
<p>Agent instructions tightened by half a sentence. A docstring brought in sync with code. The platform is a little cleaner every time I run it.</p>
<p>I can see a world where all software works like this. A coding agent managing your platform end-to-end, fixing things small enough you'd never have prioritized them. Thanks for reading!</p>
<p>Ashpreet</p>
<hr>
<p><em>Built with 🧡 using <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a>.</em></p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Agentic Software Engineering]]></title>
            <link>https://ashpreetbedi.com/agentic-software-engineering</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/agentic-software-engineering</guid>
            <pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<blockquote>
<p>Note: this post is about building your own agents (agentic software engineering), not about using coding agents.</p>
</blockquote>
<p>By now you've probably used a few agents, or at least heard of Claude Code, Codex, or OpenClaw. Ever wondered what it takes to build your own?</p>
<p>Most people think of agents as prompts + tools in a loop. That's a reasonable assumption, but it's not production architecture.</p>
<p>The moment your agent needs to know who it's talking to, maintain state, handle concurrent requests, take sensitive actions like refunds, and survive failing tool calls, it stops being an "LLM + tools in a loop" and becomes a distributed system.</p>
<p>Building agents is the easy part. There are 75 frameworks that help you do that. The hard part is the runtime: the harness around the agent that makes it work in the real world. That's what agentic software engineering is all about.</p>
<h2>Build. Serve. Connect.</h2>
<p>Here's how I think about shipping agentic software.</p>
<p><strong>Build</strong> the agent. Define the model, tools, knowledge base, memory, storage, and guardrails. This is the layer that most frameworks give you.</p>
<p><strong>Serve</strong> it as an API. User-scoped, session-scoped, horizontally scalable. Add persistent storage, streaming, background execution, retry semantics. This is where most agentic products stall. Not because the agent doesn't work, but because it doesn't have the infrastructure to work reliably at scale.</p>
<p><strong>Connect</strong> it to where users live. Your product, Slack, Discord, MCP, wherever. An agent in a notebook is an experiment. An agent where your users are is a product.</p>
<h2>The 6 Pillars of Agentic Software</h2>
<p>Building an agent is AI engineering. Running it in production is software engineering. Together, they form agentic software engineering: the practice of building, running, and scaling agents as production services.</p>
<p>Here are the six pillars that hold it up:</p>
<p><strong>Durability</strong>. Agents reason across multiple steps, call tools that time out, and fail halfway through. If your agent crashes on step 12 of 15, restarting might duplicate a side effect or lose critical context. Agentic software needs to pause, resume, checkpoint, and recover gracefully. Durability turns failure into resumption, not a full restart.</p>
<p><strong>Isolation</strong>. Agentic software serves thousands of users simultaneously. Each user needs their own session, their own memory, their own context. Passing a user_id with each request is easy. Isolating every resource the agent touches is where the engineering comes in. Your database, your vector store, your model provider, all need to respect user boundaries. One missing filter becomes a data breach.</p>
<p><strong>Governance</strong>. Agents that can act can also cause damage. Looking up a record is harmless. Deleting a record or issuing a refund needs approval. Agentic software needs layered authority: what runs automatically, what needs human approval, and what needs admin sign-off. Today, most agents auto-execute with minimal oversight. As they get more capable, governance becomes the product.</p>
<p><strong>Persistence</strong>. An agent without persistent storage can't learn, can't build context, can't improve. We need to store sessions, memory, knowledge in a database. Persistent state is what turns a chatbot into a product. Every conversation makes the next one better.</p>
<p><strong>Scale</strong>. A thousand users hit your agent at the same time. Requests queue, you hit model rate limits, and tool calls compete for resources. Traditional services call your own backends. Agentic software calls external model APIs and third-party tools, which means you inherit their rate limits, latency, and downtime. Scaling agentic software means scaling around dependencies you don't control.</p>
<p><strong>Composability</strong>. When an agent is a service, other agents can call it. Your frontend can call it. Your Slack bot can call it. MCP clients can discover it. It becomes a building block in your architecture, and every new integration becomes a standard API call. That's how single-agent tools become multi-agent systems.</p>
<p>None of this is new. We've been building reliable distributed systems for decades. The AI industry just hasn't brought those lessons along yet, and we're feeling it in every failed deployment.</p>
<h2>From Theory to Practice</h2>
<p>As always, I come bearing code. Here's how you can start building your own agentic service today.</p>
<pre class="language-bash"><code class="language-bash"><span class="token comment"># 1. Clone the repo</span>
<span class="token function">git</span> clone <span class="token punctuation">\</span>
    https://github.com/agno-agi/agentos-docker-template.git <span class="token punctuation">\</span>
    agentos

<span class="token builtin class-name">cd</span> agentos

<span class="token comment"># 2. Set your model provider key</span>
<span class="token function">cp</span> example.env .env
<span class="token comment"># Edit .env and add OPENAI_API_KEY</span>

<span class="token comment"># 3. Start the application</span>
<span class="token function">docker</span> compose up -d --build

<span class="token comment"># 4. Optional: Load documents for the knowledge agent</span>
<span class="token function">docker</span> <span class="token builtin class-name">exec</span> -it agentos-api python -m agents.knowledge_agent
</code></pre>
<p>This gives you a containerized service with persistent storage (Postgres), two starter agents (a knowledge agent using Agentic RAG and an MCP agent for external tool use), and a REST API you can connect to from anywhere.</p>
<p>I'm using Docker for this template because Docker runs everywhere: your laptop, AWS, GCP, Azure, Railway. The same container you develop locally is the one you deploy to production. The <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agentos-docker-template">README</a> covers everything you need to get started.</p>
<p>After running the service:</p>
<ol>
<li>Open <a target="_blank" rel="noopener noreferrer" class="" href="http://localhost:8000/docs">localhost:8000/docs</a> to see your API.</li>
<li>Connect to the web UI at <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> where you can chat with your agents, trace runs, manage knowledge, create schedules and approve sensitive tool calls. One UI for your agentic software.</li>
</ol>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-docker-template.mp4">Your browser does not support the video tag.</video>
<p>Adding your own agent is a few lines of Python and a restart. Swap models with a one-line change. Add tools from 100+ integrations. The template is a starting point. Read the <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/introduction">Agno docs</a> to learn more.</p>
<h2>Governance &amp; Elicitation</h2>
<p>Most agents run tool calls with minimal oversight or auditability. In practice, we need layered authority:</p>
<ol>
<li>Tools that run freely</li>
<li>Tools that need user approval</li>
<li>Tools that need admin approval</li>
</ol>
<p>Agents also need to ask questions (often called elicitation). The Claude Code team shared a <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/trq212/status/2027463795355095314">great article</a> on the AskUserQuestion tool used by Claude.</p>
<p>This is available in Agno as <code>UserFeedbackTools</code>. Here's a support agent that can look up orders freely, ask the customer structured questions when it needs more information, and wait for admin approval before issuing a refund:</p>
<pre class="language-python"><code class="language-python">support <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    <span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"support"</span><span class="token punctuation">,</span>
    name<span class="token operator">=</span><span class="token string">"Support"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>agent_db<span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>
        lookup_order<span class="token punctuation">,</span>             <span class="token comment"># auto-execute</span>
        search_help_docs<span class="token punctuation">,</span>         <span class="token comment"># auto-execute</span>
        issue_refund<span class="token punctuation">,</span>             <span class="token comment"># requires user confirmation</span>
        UserFeedbackTools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>      <span class="token comment"># structured questions</span>
    <span class="token punctuation">]</span><span class="token punctuation">,</span>
    instructions<span class="token operator">=</span>instructions<span class="token punctuation">,</span>
    enable_agentic_memory<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Watch what happens when a customer asks for a refund.</p>
<ul>
<li>The agent looks up the order on its own, no permission needed.</li>
<li>Then it hits a decision point: why does the customer want the refund?</li>
<li>Instead of guessing, it presents a structured question with clear options: defective, wrong item, changed mind.</li>
<li>The customer picks one. Now the agent calls the refund tool, but because refunds carry real consequences, it pauses for user approval.</li>
<li>Once approved, the agent runs the refund tool.</li>
</ul>
<p>Three levels of agency in one conversation. You can view the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/demo/tree/main/agents/support">full code here</a>.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/approvals-flow.mp4">Your browser does not support the video tag.</video>
<p>The agent knows when to act, when to ask, and when to wait. That's what governance looks like in practice. The runtime has to support all three modes, and the transitions between them have to feel natural.</p>
<blockquote>
<p>Note: the approvals flow on the UI is actively being developed. The refund should wait for admin approval, not user approval. This is implemented on the SDK but not the UI yet. This is being fixed this week.</p>
</blockquote>
<h2>Agents are distributed systems</h2>
<p>The <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi/status/2024885969250394191">5 Levels</a> describe how agentic software grows in capability (and complexity). The <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi/status/2026708881972535724">7 Sins</a> describe how they fail in production. The 6 Pillars describe what it takes to build them right.</p>
<p>The consistent message across all three: agentic software engineering is a discipline. The teams that internalize this early will ship great products. The teams that keep treating agents as scripts will continue to miss the mark.</p>
<p>Clone the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agentos-docker-template">repo</a>. Build your first agent. Ship it where your users are.</p>
<hr>
<p>Links:</p>
<ul>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/">Agno Docs</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno Github</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/deploy/introduction">AgentOS Templates</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agentos-docker-template">AgentOS Docker Template</a>
</li>
</ul>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Context Providers]]></title>
            <link>https://ashpreetbedi.com/context-providers</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/context-providers</guid>
            <pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>In 1973, Doug McIlroy added pipes to Unix. The idea was small. Each program reads stdin, writes stdout, and the shell composes them. The shell didn't know what <code>grep</code> or <code>awk</code> did. It just wired them together. Fifty years later we still type <code>grep | awk | sort</code> without thinking. That's how good the abstraction was.</p>
<p>In 2026 we are building agents the other way around. If you believe we're at the cusp of a new operating system, then the agent is the shell and every tool is a program. The question is: why do we keep stuffing the shell's prompt with the man pages of every program it might call?</p>
<p>The most powerful technology of this decade is bottlenecked by the number of tools it can hold in its "RAM" and every agent with multiple toolkits hits the same three problems:</p>
<ol>
<li>Context pollution from too many tools</li>
<li>Degrading performance from blurry scopes (eg: <code>search</code> operations in multiple toolkits)</li>
<li>The main agent getting confused because its context is all tool instructions</li>
</ol>
<p>I've been testing a protocol that fixes all three.</p>
<h2>The three walls</h2>
<p><strong>Context pollution.</strong> Every tool takes up precious context. Schemas, descriptions, example usage, all of it lands in the system prompt. A Slack toolkit is 8 to 12 tools. Gmail is 6 to 10. Calendar another 6. Drive, GitHub, your CRM, the web. You're at 50 tools before adding anything custom. From what I've seen, somewhere past 20 tools models start hallucinating tools that don't exist, calling tools with the wrong shape, or skipping the right tool because its description got buried.</p>
<p><strong>Blurry scopes don't compose.</strong> Two tools both take a <code>workspace</code> argument: one is Slack's, one is Google's. <code>search</code> in one MCP collides with <code>search</code> in another. <code>send_message</code> could be Slack, email, or your CRM. The agent picks wrong half the time, and no naming convention fixes it because the same word legitimately means different things in different sources. The minute you compose tools from sources you don't control (MCP servers, third-party SDKs), you get overlap, and the model has no reliable way to disambiguate.</p>
<p><strong>Tool-use logic lives with the main agent.</strong> This is the deepest wall.</p>
<p>For an agent to use Slack well, the system prompt has to explain Slack: look up the user ID before you DM them, resolve a channel name to an ID before you post, prefer <code>conversations.history</code> for channels and <code>conversations.replies</code> for threads, paginate by cursor instead of offset. That's hundreds of tokens of Slack-specific guidance. Now do that for Gmail. For Calendar. For Drive. For your database. For GitHub.</p>
<p>The system prompt becomes the union of every API's quirks. Every turn carries every rule, even when the user just asked about Slack. The main agent is stuck reasoning about both the user's question and the mechanics of every API. Adding a source means editing the prompt and praying nothing breaks.</p>
<h2>The missing layer</h2>
<p>Today the canonical agent shape is some variant of:</p>
<pre><code>Agent → Tools                             # raw
Agent → MCP server → Tools                # MCP
Agent + Skill instructions → Tools        # Skills
</code></pre>
<p>In all three cases the agent sees the raw tool surface of every source. Every Slack tool, every Drive tool, every CRM tool. The agent's prompt has to contain how to use every one of them.</p>
<p>The shape I've been testing puts a thin layer in between:</p>
<pre><code>Agent ↔ ContextProvider ↔ Tools
</code></pre>
<p>A <code>ContextProvider</code> wraps one source: Slack, GitHub, Drive, Filesystem, your DB. To the calling agent, it exposes exactly two tools:</p>
<ul>
<li><code>query_&lt;source&gt;(question)</code> for natural-language reads</li>
<li><code>update_&lt;source&gt;(instruction)</code> for natural-language writes (or a clean read-only error)</li>
</ul>
<p>The main agent doesn't see Slack's twelve tools. It sees <code>query_slack</code> and <code>update_slack</code>. It doesn't see Drive's quirks. It sees <code>query_drive</code>. Add ten more sources and the agent's tool surface stays linear at 2N.</p>
<p>Behind each tool is a sub-agent scoped to that one source. The sub-agent owns the source's tools, the source's quirks, the lookup-before-write patterns, the pagination weirdness. It runs in its own context, returns an answer, and the main agent gets a clean result.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>context<span class="token punctuation">.</span>slack <span class="token keyword">import</span> SlackContextProvider
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>context<span class="token punctuation">.</span>gdrive <span class="token keyword">import</span> GDriveContextProvider
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>context<span class="token punctuation">.</span>database <span class="token keyword">import</span> DatabaseContextProvider
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>openai <span class="token keyword">import</span> OpenAIResponses

<span class="token comment"># Sub-agents do source-specific tool work — a cheaper model is plenty.</span>
provider_model <span class="token operator">=</span> OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.4-mini"</span><span class="token punctuation">)</span>

slack <span class="token operator">=</span> SlackContextProvider<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"slack"</span><span class="token punctuation">,</span> token<span class="token operator">=</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">,</span> model<span class="token operator">=</span>provider_model<span class="token punctuation">)</span>
drive <span class="token operator">=</span> GDriveContextProvider<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"drive"</span><span class="token punctuation">,</span> service_account_file<span class="token operator">=</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">,</span> model<span class="token operator">=</span>provider_model<span class="token punctuation">)</span>
crm   <span class="token operator">=</span> DatabaseContextProvider<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"crm"</span><span class="token punctuation">,</span> sql_engine<span class="token operator">=</span>engine<span class="token punctuation">,</span> model<span class="token operator">=</span>provider_model<span class="token punctuation">)</span>

agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.4"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span><span class="token operator">*</span>slack<span class="token punctuation">.</span>get_tools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token operator">*</span>drive<span class="token punctuation">.</span>get_tools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token operator">*</span>crm<span class="token punctuation">.</span>get_tools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    instructions<span class="token operator">=</span><span class="token string">"\n"</span><span class="token punctuation">.</span>join<span class="token punctuation">(</span><span class="token punctuation">[</span>slack<span class="token punctuation">.</span>instructions<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> drive<span class="token punctuation">.</span>instructions<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> crm<span class="token punctuation">.</span>instructions<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>The agent sees four tools: <code>query_slack</code>, <code>query_drive</code>, <code>query_crm</code>, <code>update_crm</code>. Three sources, two of them read-only. Five years of API quirks for those three sources, summarized into four tool descriptions.</p>
<p>Btw the quirks didn't vanish, they just moved into the sub-agent's scope, where they belong, and only load on turns that actually touch that source.</p>
<p>The biggest advantage is that the main agent doesn't see the gunk of intermediate tool calls, or the fact that slack returned 1000 channel names to find the correct channel id.</p>
<h2>A walkthrough</h2>
<p>Here's an example from <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/scout">Scout</a>, the agent I've been testing this with. The user asks:</p>
<blockquote>
<p>What's the verdict on the v3 migration, and is the spec doc up to date?</p>
</blockquote>
<p>Scout has the following tools: <code>query_slack</code>, <code>query_drive</code>, <code>query_filesystem</code>, <code>query_web</code>.</p>
<p>It picks two.</p>
<pre><code>query_slack("recent engineering channel discussion about v3 migration")
</code></pre>
<p>The Slack sub-agent runs <code>search_messages</code>, walks the thread with <code>conversations_replies</code>, and returns:</p>
<blockquote>
<p>On Tuesday, engineering decided to phase the v3 migration over two weeks, starting with the read path. James raised concerns about the rollback procedure that haven't been resolved in-thread.</p>
</blockquote>
<pre><code>query_drive("v3 migration spec document, last updated when, current contents")
</code></pre>
<p>The Drive sub-agent runs <code>files.list</code>, fetches the doc, returns:</p>
<blockquote>
<p>The v3 migration spec was last edited Friday by Mike. The current version describes a single-phase migration. No mention of phased rollout or rollback procedure.</p>
</blockquote>
<p>The main agent writes the answer:</p>
<blockquote>
<p>Engineering agreed Tuesday on a phased migration starting with the read path, but the spec was last edited Friday and still describes a single-phase approach. James's rollback concerns aren't reflected in the doc either. The spec needs an update.</p>
</blockquote>
<p>The main agent never touched a Slack API tool. It never paginated a Drive listing. It composed two natural-language calls and wrote one paragraph. The Slack and Drive specialists handled their own messes in their own contexts.</p>
<h2>What about Skills?</h2>
<p>Skills are another serious attempt at problem #3. A Skill packages task-specific instructions ("here's how to use Slack") into a module the model loads when relevant, instead of carrying it in the system prompt full-time. They move task knowledge out of the always-on prompt and into something more conditional. If problem #3 were the only issues, we could solve this skill issue.</p>
<p>But Skills don't fix walls 1 and 2. The strongest version — Skills that bring scoped tools when loaded — still leaves you with N tool surfaces in the agent's context once N skills are active, and <code>search</code> still collides with <code>search</code>. There's also a higher chance of conflicting Skills quietly degrading the agent without you noticing.</p>
<p>ContextProvider and Skills compose. A Slack ContextProvider's sub-agent can itself load a Slack Skill, and that's where the Skill does its best work: in the context of the thing actually executing against Slack, not in the main agent that just wanted an answer.</p>
<p>Roughly: Skills compress <em>how to do a task</em>. ContextProvider hides <em>that there's a task</em> until the main agent decides to delegate one. Different layers, both useful.</p>
<h2>Examples</h2>
<p>As always, I come bearing code. The full set of examples lives in <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/tree/main/cookbook/12_context">cookbook/12_context</a>. A few worth pointing at:</p>
<p><strong>Sources covered out of the box.</strong> Filesystem (<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/00_filesystem.py">00</a>), database (<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/04_database_read_write.py">04</a>), Slack (<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/05_slack.py">05</a>), Google Drive (<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/07_google_drive.py">07</a>), GitHub (<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/12_github.py">12</a>), and web via Exa or Parallel, direct SDK or MCP endpoint (<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/01_web_exa.py">01</a>, <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/02_web_exa_mcp.py">02</a>, <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/03_web_parallel.py">03</a>, <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/11_web_parallel_mcp.py">11</a>). Every provider follows the same <code>query_&lt;id&gt;</code> / <code>update_&lt;id&gt;</code> shape.</p>
<p><strong>Read/write split with real security.</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/04_database_read_write.py">04_database_read_write.py</a> spins up a SQLite DB and has the agent insert a contact, read it back, and verify with direct SQL. Read and write go through <a target="_blank" rel="noopener noreferrer" class="" href="/systems-engineering">separate sub-agents with separate engines</a>, same shape I used in <a target="_blank" rel="noopener noreferrer" class="" href="/dash-v2">Dash</a>, same reason: the database itself rejects writes the model isn't allowed to make, regardless of what the prompt says. <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/12_github.py">12_github.py</a> does the same shape over a real repo: reads through a read-only sub-agent on a clone, writes through a sub-agent that operates on a per-session worktree on a <code>&lt;prefix&gt;/&lt;task&gt;</code> branch and ends in a PR. The agent cannot push to the default branch.</p>
<p><strong>Compositional multi-source.</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/09_web_plus_slack.py">09_web_plus_slack.py</a> is the shape flat tool layouts can't do without orchestration code. The agent pulls topics from a Slack channel, runs a per-topic web search, and returns a briefing tying each internal thread to an external reference. Two providers, one prompt, and the main agent stitches the synthesis itself. <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/12_engineering_briefing.py">12_engineering_briefing.py</a> takes it one step further: Slack topics → codebase workspace → web fallback, all in one prompt.</p>
<p><strong>MCP wrapper.</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/blob/main/cookbook/12_context/06_mcp_server.py">06_mcp_server.py</a> wraps any MCP server (stdio or HTTP) as a single <code>query_&lt;id&gt;</code> tool. The sub-agent's instructions are built from the server's <code>list_tools()</code> response at connect time, so the calling agent never sees stale tool docs. Staleness is bounded by sub-agent lifetime, not eliminated. But for any reasonable session, that's the same difference. This is the move that collapses a 50-tool MCP server to 1 tool from the main agent's view.</p>
<h2>Surprises and open questions</h2>
<p><strong>Sub-agents are cheaper than expected.</strong> I assumed the extra hops would dominate. They don't. The main agent's context is so much smaller that its calls are faster, and the sub-agent only fires on turns that touch its source. Anecdotal on Scout's workload: total tokens are roughly flat at low source counts and improve as the source count grows; wall-clock latency drops at every source count I've measured. I haven't generalized this off Scout yet.</p>
<p><strong>The main agent's prompt got smaller.</strong> I expected to add orchestration logic. I removed it. With a uniform surface, the routing rules collapse to "pick the right <code>query_&lt;source&gt;</code>". gpt-5.4 just works out of the box, with zero guidance on how to use a source.</p>
<p>A few things I'm still working through:</p>
<ul>
<li><strong>How thin can the main agent's prompt get?</strong> I've been hill-climbing this with evals.</li>
<li><strong>Caching across calls in a session.</strong> The same <code>query_&lt;source&gt;("who's on the X channel")</code> shouldn't re-do the work two turns later.</li>
<li><strong>Per-user authentication that survives the hop.</strong> Partially solved. Scout passes <code>user_id</code>, <code>session_id</code>, <code>metadata</code>, and <code>dependencies</code> through to the sub-agent. More to do for OAuth flows.</li>
<li><strong>When to expose underlying tools instead.</strong> Some sources benefit from the agent driving the tool calls directly, usually when the source is small enough that the schema cost is low. The protocol has a mode for this. I'm still figuring out where the line sits.</li>
</ul>
<h2>TL;DR</h2>
<p>The agent's tool surface should be its job description, not the union of every API it might touch. Context Providers move the API mess to where it belongs and leave the main agent free to reason about what the user actually asked.</p>
<ul>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/tree/main/cookbook/12_context">Cookbook examples</a>
</li>
<li><a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/scout">Scout:</a> open-source company intelligence agent built on this pattern</li>
</ul>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[How to Build an Agent Platform]]></title>
            <link>https://ashpreetbedi.com/agent-platform</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/agent-platform</guid>
            <pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Cloud platforms, data platforms, and now agent platforms.</p>
<p>Every company wants to build one and run a fleet of agents. Having built cloud, compute and data platforms before, I'm hoping we can learn from the mistakes of our past and build it right the first time 'round.</p>
<p>The data era is a particularly cautionary tale. I lived through the great unbundling where we had to stitch multiple tools for ingestion, orchestration, transformation, and services for metadata, BI, and data quality. Each tool worked great in isolation but using them together was a pain. 80% of engineering time went into gluing things together. Then came the bundling. Snowflake, Databricks, and the cloud providers (esp AWS) consolidated the stack and provided unified platforms.</p>
<p>The agent era is starting the same way, with features being sold as products and vendors for solving problems that don't really exist yet.</p>
<p>If you find yourself running agents with data on 3 providers, traces in multiple places and no auto-improvement loop, this article is for you.</p>
<h2>You need an agent platform</h2>
<p>If you think of agents as apps, it becomes clear that they need a system to run on, like an OS.</p>
<p>Your agent platform is responsible for running the agents, collecting data and metrics, managing security by preventing unauthorized access, and stopping one agent from accessing or polluting the data of another.</p>
<h2>What we're building</h2>
<p>Today we'll build an agent platform that you can run locally, or on Railway for $20/mo.</p>
<p>Once it's running, you should be able to ship a new agent (or workflow) without writing any code. Because the platform takes care of everything, Claude Code can build new agents with a high degree of quality, and then recursively improve them by querying them live. <strong>This is the single biggest advantage of having a unified agent platform.</strong></p>
<p>We'll also set up a scheduler for recurring work, lock in behavior with evals, and connect agents to interfaces like Slack.</p>
<p>The most fun is watching Claude Code recursively improve your agents.</p>
<h2>What makes an agent platform</h2>
<p>An agent platform is made of five parts:</p>
<ol>
<li><strong>Runtime:</strong> the service that runs the agents. This part does most of the heavy lifting.</li>
<li><strong>Storage:</strong> the database where our data lives: agent sessions, memory, knowledge, traces and eval history.</li>
<li><strong>Connectors:</strong> tools for agents to connect with external systems via MCP, API, or CLI. Having them in one place is a big + for security.</li>
<li><strong>Interfaces:</strong> Slack, Discord, Telegram, custom UIs. One place to resolve identity across surfaces, so the same person is the same <code>user_id</code> whether they ping you in Slack or hit the web app.</li>
<li><strong>Infrastructure:</strong> where everything runs. We'll use Docker for local and Railway for production.</li>
</ol>
<h2>Let's get started</h2>
<p>I'm going to share a foundational codebase that you can build upon.</p>
<p>This, in my opinion, is the perfect starting point for an agent platform.</p>
<pre class="language-md"><code class="language-md">agent-platform
├── agents                       # agent code goes here
│   ├── code_search.py
│   └── web_search.py
├── app                          # server code goes here
│   ├── config.yaml
│   ├── main.py
│   └── settings.py
├── compose.yaml
├── db
│   ├── session.py
│   └── url.py
├── Dockerfile
├── docs                         # docs go here
│   ├── create-new-agent.md
│   ├── eval-and-improve.md
│   ├── improve-agent.md
│   └── review-and-improve.md
├── evals
│   └── cases.py                 # test cases
└── README.md
</code></pre>
<h2>Step 1: Run locally</h2>
<p>First let's clone, configure, and run our agent platform.</p>
<p>Make sure Docker is installed and running. <a target="_blank" rel="noopener noreferrer" class="" href="https://www.docker.com/get-started/">Follow these steps</a> if not.</p>
<p>Then open your terminal and run one by one:</p>
<ol>
<li>Clone the agent-platform template</li>
</ol>
<pre class="language-bash"><code class="language-bash"><span class="token function">git</span> clone https://github.com/agno-agi/agentos-railway-template.git agent-platform
<span class="token builtin class-name">cd</span> agent-platform
</code></pre>
<ol start="2">
<li>Configure your environment. Copy the example .env file, open in your favorite code editor and add the OpenAI key there.</li>
</ol>
<pre class="language-bash"><code class="language-bash"><span class="token function">cp</span> example.env .env
</code></pre>
<ol start="3">
<li>Run your platform: 1 FastAPI server and 1 Postgres database</li>
</ol>
<pre class="language-bash"><code class="language-bash"><span class="token function">docker</span> compose up -d --build
</code></pre>
<p>This brings up two containers: a FastAPI server and a Postgres database. Confirm the API is running at <code>http://localhost:8000/docs</code>.</p>
<p>Now let's give our platform a UI.</p>
<ol>
<li>Head to <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> and sign in.</li>
<li>Click <strong>Add OS</strong> → <strong>Local</strong>, enter <code>http://localhost:8000</code>, and click <strong>Connect</strong>.</li>
</ol>
<img alt="Connect dialog for a local AgentOS" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fconnect-os.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fconnect-os.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fconnect-os.png&amp;w=1920&amp;q=75">
<p>You should see something like:</p>
<img alt="AgentOS UI with the two reference agents connected" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fos-connected.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fos-connected.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fos-connected.png&amp;w=1920&amp;q=75">
<h2>Step 2: Create your first agent</h2>
<p>The codebase comes with two reference agents and a Claude Code prompt that can build new ones for you.</p>
<p>To create a new agent, open Claude Code and run:</p>
<pre class="language-bash"><code class="language-bash">Run docs/create-new-agent.md
</code></pre>
<p>Claude will ask you what the agent should do, what tools it needs, then generate the agent file, register it in <code>app/main.py</code>, restart the container, and run a smoke test.</p>
<p>This usually takes 5-10 minutes for a simple agent. More if you're building something bespoke with custom tools.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/create-new-agent.mp4">Your browser does not support the video tag.</video>
<h2>Step 3: Test your new agent</h2>
<p>Open <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> to chat with your agent. Run it through realistic prompts. Check the traces and sessions. Try to break it. Try out-of-distribution questions, prompt injections, edge cases. This usually takes 5-20 minutes.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/test-your-agent.mp4">Your browser does not support the video tag.</video>
<h2>Step 4: Recursively improve your agent</h2>
<p>This is where your platform starts paying dividends.</p>
<p>Open Claude Code in the repo and paste:</p>
<pre><code>Run docs/improve-agent.md
</code></pre>
<p>Claude Code can directly hit your live agents using curl. Then iterate and improve your agents.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/auto-improve.mp4">Your browser does not support the video tag.</video>
<p>This is why owning the stack pays off. The trace data, the agent code, the running platform, and the iteration tool all live in one box. Claude Code can see all of it and improve as needed.</p>
<h2>Step 5: Lock in behavior with evals</h2>
<p>Evals are the regression test for your agents. Same prompts, same agents, run on a schedule, fail when behavior drifts.</p>
<p>Evals are defined as a set of cases:</p>
<pre class="language-python"><code class="language-python"><span class="token comment"># evals/cases.py</span>
Case<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"web_search_recent_anthropic_research"</span><span class="token punctuation">,</span>
    agent<span class="token operator">=</span>web_search<span class="token punctuation">,</span>
    <span class="token builtin">input</span><span class="token operator">=</span><span class="token string">"What did Anthropic publish about agent research recently?"</span><span class="token punctuation">,</span>
    criteria<span class="token operator">=</span><span class="token punctuation">(</span>
        <span class="token string">"Answers the question by citing at least one real Anthropic URL "</span>
        <span class="token string">"(anthropic.com domain). The response is grounded in fetched content "</span>
        <span class="token string">"rather than refusing to answer."</span>
    <span class="token punctuation">)</span><span class="token punctuation">,</span>
    expected_tool_calls<span class="token operator">=</span><span class="token punctuation">(</span>_WEB_SEARCH_TOOL<span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Run them later with: python -m evals</p>
<p>Results write to your Postgres via <code>eval_db</code>, so eval history shows up at <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> alongside sessions and traces. Connect Claude Code to the diagnosis loop by pasting <code>Run docs/eval-and-improve.md</code>. It triages every failure and fixes the issues in scope.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/eval-and-improve.mp4">Your browser does not support the video tag.</video>
<h2>Step 6: Run on Railway</h2>
<p>Let's take our platform live by hosting it somewhere. Your company probably has a set way of running software. Follow whatever that is.</p>
<p>If you're looking for a place to test this out without going through the full devops process, Railway is the cheapest and fastest PaaS I've found. $20/month gets you pretty far and the codebase already comes with deploy-to-railway scripts.</p>
<blockquote>
<p>Requires the <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.railway.com/cli#installing-the-cli">Railway CLI</a> and <code>railway login</code>.</p>
</blockquote>
<h3>6.1 Configure production environment</h3>
<p>The deploy scripts read <code>.env.production</code> first and fall back to <code>.env</code>. This lets you keep separate values for local and production: different OpenAI keys with different budgets, production-only credentials, a different Slack workspace, and so on.</p>
<pre class="language-bash"><code class="language-bash"><span class="token function">cp</span> .env .env.production
<span class="token comment"># Edit .env.production with production values</span>
</code></pre>
<h3>6.2 Deploy</h3>
<p>The codebase comes with a script that provisions a Postgres database and deploys the app on the same private network. Run it:</p>
<pre class="language-bash"><code class="language-bash">./scripts/railway/up.sh
</code></pre>
<h3>6.3 Your first deploy will fail. That's expected.</h3>
<p>Token-Based Authorization is ON by default.</p>
<p>Without a <code>JWT_VERIFICATION_KEY</code>, the app refuses to serve traffic. Your platform's job is to keep your data off the public web. The fix is to generate a key and put it in your production env.</p>
<blockquote>
<p>The alternative was to ship with auth off and expect people to add it on later, which we all know isn't going to happen. People will leave their servers open to the public internet, get hacked, then blame me.</p>
</blockquote>
<p>Token-Based Auth gives you three things:</p>
<ul>
<li><strong>No public access.</strong> The server rejects requests without a valid token.</li>
<li><strong>Per-request identity.</strong> The middleware parses the token and injects <code>user_id</code>, <code>session_id</code>, and custom claims into your endpoints. Each request is tied to a user and session, so data leakage is prevented.</li>
<li><strong>Granular permissions.</strong> A user-level token can run an agent and view its own sessions. An admin token can read everyone's sessions and test any agent. You don't need to know or care about RBAC right now, but you have the foundation in place for when you start thinking about security.</li>
</ul>
<h3>6.4 Get your verification key</h3>
<p>The default path is to let <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> generate the keypair for you:</p>
<ol>
<li>Open <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a>, click <strong>Add OS</strong> → <strong>Live</strong>, enter your Railway domain, and connect.</li>
<li>Enable <strong>Token Based Authorization</strong>.</li>
</ol>
<img alt="Connect dialog for a live AgentOS with Token Based Authorization enabled" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fconnect-live.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fconnect-live.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fconnect-live.png&amp;w=1920&amp;q=75">
<ol start="3">
<li>Paste the public key into <code>.env.production</code> (full PEM block, no surrounding quotes):</li>
</ol>
<img alt="AgentOS welcome dialog with the JWT public key to copy" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Flive-jwt-key.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Flive-jwt-key.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Flive-jwt-key.png&amp;w=1920&amp;q=75">
<pre class="language-bash"><code class="language-bash"><span class="token assign-left variable">JWT_VERIFICATION_KEY</span><span class="token operator">=</span>-----BEGIN PUBLIC KEY-----
MIIBIjANBgkq<span class="token punctuation">..</span>.
-----END PUBLIC KEY-----
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-sky-500 dark:bg-sky-400"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>Live connections to AgentOS require a pro subscription. Use the <code>PLATFORM30</code> coupon code for a 1 month free trial. Remember to cancel before the trial ends if you don't want to be charged.</p></div></div></div></blockquote>
<p>You don't have to use os.agno.com for this. You can generate your own RSA or EC keypair, sign tokens with the private key in your own service, and put the matching public key in <code>JWT_VERIFICATION_KEY</code>. The platform doesn't care where the key came from, as long as incoming tokens verify.</p>
<h3>6.5 Sync env and verify</h3>
<p>While <code>.env.production</code> is open, point the in-cluster scheduler at your public Railway domain so cron triggers can reach AgentOS:</p>
<pre class="language-bash"><code class="language-bash"><span class="token comment"># .env.production</span>
<span class="token assign-left variable">AGENTOS_URL</span><span class="token operator">=</span>https://<span class="token operator">&lt;</span>your-app<span class="token operator">&gt;</span>.up.railway.app
</code></pre>
<p>Then push variables to Railway:</p>
<pre class="language-bash"><code class="language-bash">./scripts/railway/env-sync.sh
</code></pre>
<p>Railway auto-deploys when env values change. Watch the logs and confirm the platform is serving:</p>
<pre class="language-bash"><code class="language-bash">railway logs --service agent-os
</code></pre>
<p>Once you see successful requests, AgentOS will connect through your Railway domain and you're live.</p>
<h3>6.6 Auto-deploys from GitHub</h3>
<p>So far every code update needs <code>./scripts/railway/redeploy.sh</code>. To auto-deploy on every push to <code>main</code>:</p>
<ol>
<li>Open the Railway dashboard → your project → the agent-os service → <strong>Settings</strong>.</li>
<li>Under <strong>Source</strong>, click <strong>Connect Repo</strong> and pick your repo.</li>
<li>Set the deploy branch to <code>main</code>.</li>
</ol>
<p>Every push to <code>main</code> now triggers a build and rolling deploy. <code>./scripts/railway/env-sync.sh</code> is still how you sync env changes.</p>
<h3>Opting out of JWT (not recommended)</h3>
<p>If you must run production without auth (e.g., inside a private VPC behind another auth layer), set <code>authorization=False</code> in <code>app/main.py</code> and redeploy. Keep authorization on for any deploy holding real data. Without it, anyone who guesses your Railway domain can read your sessions and your agents.</p>
<h3>Scaling</h3>
<p>The default deploy is two replicas at 4Gi memory and 2 vCPU each. Gives you zero-downtime rolling deploys and basic fault tolerance. Bump <code>numReplicas</code> and <code>limits</code> up or down in <code>railway.json</code> as your usage grows.</p>
<h2>Going beyond agents</h2>
<p>Rule of thumb: <strong>agents for open questions, teams for routing, workflows for processes.</strong> Most of your platform will be agents. A few will be teams or workflows. You'll know when you need each.</p>
<p><strong>Multi-agent teams.</strong> When one agent isn't enough, route work across a team of specialists. Agno teams come in three modes:</p>
<ul>
<li><em>Coordinate.</em> A leader plans the work, calls the right specialists, and synthesizes.</li>
<li><em>Route.</em> A router picks one specialist to handle the request.</li>
<li><em>Broadcast.</em> Every specialist runs in parallel; you aggregate.</li>
</ul>
<p>Use teams when the right specialist isn't known up front. The <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/teams/overview">teams overview</a> walks through each mode.</p>
<p><strong>Agentic workflows.</strong> When a process needs to run the same way every time, write a workflow. Workflows give you determinism. Use them for the few high-leverage flows in your platform that need to be repeatable. The <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/workflows/overview">workflows overview</a> covers the patterns.</p>
<p>For more on agents themselves (instructions, tools, memory, model configuration), the <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/agents/overview">agents overview</a> is the reference.</p>
<h2>Scheduled tasks</h2>
<p>The platform ships with a lightweight scheduler enabled by default in <code>app/main.py</code>:</p>
<pre class="language-python"><code class="language-python">agent_os <span class="token operator">=</span> AgentOS<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"AgentOS"</span><span class="token punctuation">,</span>
    scheduler<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Schedule any agent or workflow on a cron. Common patterns:</p>
<ul>
<li><strong>Maintenance.</strong> Purge sessions older than 90 days. Vacuum your Postgres tables. Rotate trace data into cold storage.</li>
<li><strong>Proactive runs.</strong> Every weekday morning, run an agent that summarizes overnight news for your portfolio companies. Post to Slack.</li>
<li><strong>Catch regressions.</strong> Schedule <code>python -m evals</code> weekly against your production agents. Drift shows up in eval history before users feel it.</li>
</ul>
<p>See the <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/agent-os/scheduler">agno scheduler docs</a> for the cron API.</p>
<h2>Connect to interfaces</h2>
<p>Your agents should be available where your users are. Slack threads. Discord channels. Telegram for the field team.</p>
<p>Or most importantly: a custom UI inside your product.</p>
<p>For Slack, Discord, Telegram: the pattern is similar. Expose the agents via an interface. See <code>app/main.py</code> for a reference:</p>
<pre class="language-python"><code class="language-python">interfaces<span class="token punctuation">:</span> <span class="token builtin">list</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">if</span> SLACK_BOT_TOKEN <span class="token keyword">and</span> SLACK_SIGNING_SECRET<span class="token punctuation">:</span>
    <span class="token keyword">from</span> agno<span class="token punctuation">.</span>os<span class="token punctuation">.</span>interfaces<span class="token punctuation">.</span>slack <span class="token keyword">import</span> Slack

    interfaces<span class="token punctuation">.</span>append<span class="token punctuation">(</span>
        Slack<span class="token punctuation">(</span>
            agent<span class="token operator">=</span>code_search<span class="token punctuation">,</span>
            streaming<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
            token<span class="token operator">=</span>SLACK_BOT_TOKEN<span class="token punctuation">,</span>
            signing_secret<span class="token operator">=</span>SLACK_SIGNING_SECRET<span class="token punctuation">,</span>
            resolve_user_identity<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span>
    <span class="token punctuation">)</span>

<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

agent_os <span class="token operator">=</span> AgentOS<span class="token punctuation">(</span>
    <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
    interfaces<span class="token operator">=</span>interfaces<span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Read the <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/agent-os/interfaces/overview">interfaces guide</a> for more information.</p>
<h2>Wrapping up</h2>
<p>Congratulations. If you made it this far, you have a unified agent platform running securely in your cloud. Technical users can create and deploy agents using Claude Code. Non-technical users can use the no-code UI.</p>
<p>Sessions, traces, and knowledge live in your database. Your infrastructure is gated behind JWT-based RBAC and API keys are managed in one place.</p>
<h2>Why it's important to control your data</h2>
<p>Before we close, a note on data sovereignty.</p>
<p>Every interaction with your platform produces data. Sessions, memory, and traces all flow into your Postgres database. Two reasons why this matters:</p>
<ol>
<li>
<p><strong>Compliance.</strong> Keeping the data in your own database reduces the risk of a breach. The moment customer data, proprietary code, or internal documents touch a third-party trace tool or memory service, your level of security is whatever their level of security is.</p>
</li>
<li>
<p><strong>Auto Improvement.</strong> Your traces are how Claude Code (or you) close the loop on agent quality. Coding agents are going to be the main way to build and improve agents. They can only work because the trace data lives where the iteration tool can read it. Vendor-stitched stacks split this surface across three SaaS products and the loop never closes.</p>
</li>
</ol>
<p>The agents you ship today are the smallest part of what you've built. The platform underneath them, and the iteration loop it enables, is the thing that matters.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Dynamic Software]]></title>
            <link>https://ashpreetbedi.com/dynamic-software</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/dynamic-software</guid>
            <pubDate>Thu, 30 Apr 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>For fifty years, software has been static.</p>
<p>Every program you've ever used is a collection of functions run through a hard-coded control flow: If, else, while, for. The functions do the work. Reading from databases. Calling APIs. Transforming data.</p>
<p>Same Input = Same Output. This was the contract for fifty years.</p>
<p>Then 2024 happened. The control flow came alive and created a new category of software. Software that is alive, dynamic, on-demand.</p>
<h2>Software is dead, long live Software</h2>
<p>Static software is a recording. You press play and you get back exactly what was captured. Same notes, same order, every time. The performance happened once, in a devbox, and now it plays the same tune every time.</p>
<p>Dynamic Software is a live orchestra.</p>
<p>The score exists. The instruments exist. The musicians exist. But what happens in the room tonight depends on the maestro, the players, the moment. The model is the maestro. The tools are the instruments. The control flow is the performance, not the recording.</p>
<p>This is what people feel when they use a great agent and can't quite explain why it feels different. They've spent their whole lives interacting with buttons. Now they're in a room with a live performance for the first time. The software is responding to them, here, now, with judgment and presence. It's listening. It's adjusting. It's alive in a way software has never been alive before.</p>
<p>Recordings are perfect. Live performances aren't. A live orchestra makes choices. Sometimes it stumbles. Sometimes it surprises you. The reason we still pay to hear live music is that something different happens in the room.</p>
<p>The performance is the point.</p>
<p>Dynamic Software is alive. It's not deterministic. It's not perfect. And once you've felt the difference, recordings feel like what they always were. Frozen.</p>
<p>We're not building better recordings. We're building the first generation of software that performs.</p>
<h2>Assumptions Dynamic Software breaks</h2>
<p>When software comes alive, every assumption built on static software breaks.</p>
<p><strong>Determinism breaks.</strong> Same input no longer means same output. The model considers context, memory, learnings. The software does something different on Tuesday afternoon than it did on Monday morning. While this can be (somewhat) controlled in text, we should note that the visual era is next. Charts, dashboards, entire screens generated on-demand. Instead of forcing determinism on non-deterministic software, give in, enjoy the ride.</p>
<p><strong>State and time work differently.</strong> Static programs don't need to remember much. The control flow is the same every time, so state lives in a database and is CRUD only. In Dynamic Software, state is context. Memory of past sessions. History of what worked. Knowledge of the domain. The database stops being storage and becomes the context the software runs on.</p>
<p>Sessions follow from this. A static API endpoint is stateless by design. Each request is independent. Dynamic Software is the opposite. A session is a continuous context that spans minutes, days, sometimes weeks. The user comes back, the agent picks up where it left off. Sessions become first-class.</p>
<p>Time changes too. Static software returns in milliseconds, seconds if you don't believe in data co-location. Dynamic Software reasons. It calls tools. It waits for tools to return. It reasons again. A single request takes minutes sometimes. Streaming is the default. Background execution is a core primitive. The HTTP request/response model strains and breaks and so does the default 29s loadbalancer timeout.</p>
<p><strong>The software needs to watch itself.</strong> With static software, you can read the code and know what it does. With Dynamic Software, you can't. The control flow is a model and the model is opaque. The only way to know what your software did is to record everything it did. Every reasoning step. Every tool call. Every retrieval. Tracing goes from a debugging tool to the only way to understand your software.</p>
<p>Watching isn't enough. Static programs don't make decisions, so there's nothing to approve. Dynamic Software makes decisions, and decisions have consequences. Some can be made freely. Some need the user. Some need an admin. Your software has to express which is which, and your runtime has to enforce it.</p>
<p>Every one of these is a real engineering problem. Every team building Dynamic Software hits them all. Most spend months solving these from scratch.</p>
<h2>A new category needs a new runtime</h2>
<p>Static software has a mature runtime. You write Django or Express, deploy to a managed platform, and don't think about HTTP, sessions, scaling, or recovery. The infrastructure is solved. The platform handles it.</p>
<p>Dynamic Software has no equivalent. You write an agent. Then you build six months of infrastructure around it, fixing every edge case manually. Edge cases you only learn after running agents at scale. SSE + websockets. Streaming + background execution. Sessions that survive restarts. Storage you can actually query, not five vendors stitched together. Approval gates that wait for admin sign-off, not just user confirmation. Per-resource, per-tool RBAC. Agents available on Slack, Telegram, WhatsApp, because no one wants to use a custom UI.</p>
<p>This is why 80% of agents don't work, there's a painful amount of grind in the last mile.</p>
<p>The last shift this big was going from desktop apps to web apps. Web software needed its own runtime, its own protocols, its own infrastructure, its own developer tools. We spent two decades building all of it.</p>
<p>Dynamic Software is here. Starting from scratch. Its own runtime. Its own protocols. Its own infrastructure. Its own developer tools.</p>
<h2>The next decade</h2>
<p>Static software took fifty years to mature. Operating systems, databases, web servers, deploy platforms, observability stacks, identity providers. We forget how recent most of it is. Heroku was 2007. Kubernetes was 2014. Vercel was 2015. The infrastructure we now take for granted is younger than most of the people building on it.</p>
<p>Dynamic Software is at year one.</p>
<p>Whoever builds the runtime, the protocols, the developer tools, the platforms, defines the next era of software. The work ahead is enormous. It is also the most interesting work I've done in the past fifteen years.</p>
<p>Come build with us at <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">Agno</a>.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Systems Engineering]]></title>
            <link>https://ashpreetbedi.com/systems-engineering</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/systems-engineering</guid>
            <pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<span class="text-xl font-semibold"><p><strong>The Key To Building Agentic Software That Works</strong></p></span>
<p>In the early 1940s, Bell Labs was building the national telephone network, the most complex technical system in the world at the time. Millions of switches, cables, relays, and operators had to work together. The engineers discovered something that would become an 80-year-old lesson: you can't optimize a system by optimizing individual components. The behavior of the whole (call routing, reliability, capacity, cost) emerged from how the parts interacted. They needed a discipline focused on the interactions between components.</p>
<p><strong>They called it systems engineering.</strong></p>
<h2>Agentic Software Is a Systems Engineering Problem</h2>
<p>Coding agents have lowered the barrier to writing code, <strong>but they haven't lowered the requirements of production software</strong>.</p>
<p>Software engineering is, and has always been, systems engineering and agentic software is no different. If you're building agentic software, your system needs to bridge five layers:</p>
<p><strong>1. Agent Engineering.</strong> Your agent or multi-agent logic and execution flow. Model, system instructions, tool configurations, handoffs, context management, observability. This is where you define what your agent does, how it runs, and how it responds. Your agent's behavior should be deterministic where possible and observable where it isn't.</p>
<p><strong>2. Data Engineering.</strong> Your agent is only as good as the context it has access to, and context is just data under the hood.</p>
<p>Call it memory, storage, knowledge. Your Agent's data should be managed with data engineering principles. Well designed schemas, structured querying, databases for fast read/writes, object storage for long-term storage, and workflows that keep your knowledge and memory up to date. The patterns are decades old. Use them.</p>
<p><strong>3. Security Engineering.</strong> Auth, RBAC, governance, data isolation, audit trails. Your agent's capabilities are defined by its tools, and those tools should be scoped with JWT-backed permissions. Read-only access IS NOT a prompt instruction, it's a tool configuration.</p>
<p>Actions should have approval tiers: reads run freely, writes need user approval, sensitive operations need admin sign-off. Most actions should be logged and queryable for the life of the product.</p>
<p>And please, isolate requests. One user's context bleeding into another's is a data breach, not a "bug". It has serious consequences and there are laws protecting user data.</p>
<p><strong>4. Interface Engineering.</strong> How users and other agents reach your agent.</p>
<p>REST API, Slack, MCP server, terminal, Chat UI. In the old world, you had one API and one client. Now you have multiple surfaces, each with its own identity system. A Slack user ID is not your product's user ID. An MCP client authenticating as another agent is not a human user. Interface engineering is about making sure your auth, policies, and access controls hold consistently across every surface your agent is reachable from.</p>
<p><strong>5. Infrastructure Engineering.</strong> How you run and scale your software. Containers, cloud deployment, horizontal scaling. Generally called DevOps.</p>
<p>The good news: 95% of this is identical to running any other service. Re-use existing patterns, they'll serve you well. The 5% that's different: agent requests take longer (increase your load balancer timeouts), responses stream (plan for SSE or WebSockets), and the best agents are proactive (scheduled tasks, background execution). None of this is new.</p>
<hr>
<p>The key unlock for AI engineers is realizing that agentic software is just regular software, with the business logic replaced by agents, and interfaces going from request/response to streaming across multiple surfaces.</p>
<p><strong>Systems engineering is the discipline of making these parts work together, and is the key to building agentic software that works.</strong></p>
<p>When you look at your software from a systems perspective, the right decisions become obvious. You give your agent well-scoped tools, not unfettered bash access. You store sessions, memory, and knowledge in a database, not in files, so you can utilize decades of multi-tenant patterns.</p>
<p>When you design one layer in isolation, you inherit constraints that cascade through the rest of the system. When you design from the system's perspective, each layer reinforces the others.</p>
<h2>Systems Engineering in Practice</h2>
<p>I can't make a claim like this and not give you working code.</p>
<p><a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/dash">Dash</a> is an open-source, self-learning data agent. You ask it questions in plain English, it writes SQL, runs it, and tells you what the numbers mean. Simple enough to clone and adapt. Real enough to demonstrate all five layers. Here's how it looks (2x speedup)</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/dash-agentos-ui.mp4">Your browser does not support the video tag.</video>
<p>Dash is live in many companies and works incredibly well. The difference is the system behind it. Here's how each layer works.</p>
<h3>Agent Engineering</h3>
<p>Dash is a team of three agents. A Leader routes requests to two specialists: an Analyst that queries data (read-only) and an Engineer that builds computed assets like views and summary tables.</p>
<p>Each specialist gets similar tools, but wired up for different purposes. The Analyst's SQL tools connect to a read-only database engine. The Engineer's SQL tools connect to a writable engine scoped to a single schema. Same interface, different permissions, determined by configuration, not prompts.</p>
<p>Instructions are assembled at runtime from table metadata and business rules stored as structured files.</p>
<h3>Interface Engineering</h3>
<p>One system, multiple surfaces.</p>
<p>Dash serves a REST API, a Slack bot, a web UI, and a CLI. Each surface handles identity differently: Slack maps thread timestamps to sessions, the API uses JWT tokens in production. But all four hit the same agents, same tools, same knowledge. Adding a new interface does not require rebuilding the agent logic.</p>
<p>Your auth and access controls need to hold across every surface, because the agent doesn't know which one it's being called from. Here's dash being used in slack.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/dash-in-slack.mp4">Your browser does not support the video tag.</video>
<h3>Data Engineering</h3>
<p>Six layers of context, and tools for learning.</p>
<p>Raw LLMs writing SQL hit a wall fast: schemas lack meaning, types are misleading, tribal knowledge is missing, there's no way to learn from mistakes. Dash solves this with six layers of grounded context:</p>
<ol>
<li>Table metadata (schema, columns, relationships)</li>
<li>Human annotations (metrics, definitions, business rules)</li>
<li>Query patterns (SQL that is known to work)</li>
<li>Institutional knowledge (docs, wikis)</li>
<li>Learnings (error patterns and discovered fixes)</li>
<li>Runtime context (live schema inspection)</li>
</ol>
<p>These layers feed two systems.</p>
<ul>
<li>The first is curated knowledge: table schemas, validated queries, and business rules loaded into PostgreSQL.</li>
<li>The second is discovered learnings: error patterns and fixes that the agent saves automatically when it hits problems and recalls on future queries.</li>
</ul>
<p>The learning loop is simple: the agent runs a query, gets a type error, diagnoses the fix, saves it. Next time it sees a similar column, it gets it right the first time. And when the Engineer creates a new view, it records the schema and example queries into the knowledge base. The Analyst discovers it on the next search and starts using it.</p>
<p>Query 100 is better than query 1, not because the model improved, but because the data layer got better.</p>
<h3>Security Engineering</h3>
<p>Enforced by the system, not the prompt.</p>
<p>Production auth uses RBAC with JWT verification. Every query is scoped to <code>user_id</code>. An eval suite tests these boundaries directly: it prompts the agents to leak credentials, execute destructive SQL, and cross schema boundaries, then verifies they can't.</p>
<p>Security is a system property tested across layers.</p>
<p>The Analyst's read-only access is a PostgreSQL connection parameter. The database itself rejects writes regardless of what the model generates. The Engineer can write, but only to a single schema: a query-level guard blocks any operation targeting the source data.</p>
<h3>Infrastructure Engineering</h3>
<p>Boring on purpose.</p>
<p>Standard Python container. Docker Compose for local development. One-command cloud deployment. Streaming via SSE through a standard ASGI server. The 95% that's identical to any other service is identical. The 5% that's different (longer timeouts, streaming, scheduled tasks) is handled with standard tools.</p>
<p>You can clone it, run <code>docker compose up</code>, and have the entire system running in minutes. One command, five layers, a working product.</p>
<pre class="language-bash"><code class="language-bash"><span class="token comment"># Clone the repo</span>
<span class="token function">git</span> clone https://github.com/agno-agi/dash.git

<span class="token builtin class-name">cd</span> dash

<span class="token comment"># Set your keys</span>
<span class="token function">cp</span> example.env .env
<span class="token comment"># Edit .env and add your model provider key</span>

<span class="token comment"># Start the system</span>
<span class="token function">docker</span> compose up -d --build
</code></pre>
<h2>TLDR</h2>
<p>Agentic software is just software. The agent replaces business logic. Everything else is systems engineering. Five layers: agent, data, security, interface, infrastructure. Each layer affects the others. Design them together and the system compounds. Design them in isolation and you spend your time patching around constraints that shouldn't exist. We walk through all five with <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/dash">Dash</a>, a real open-source data agent you can run yourself.</p>
<p>Links:</p>
<ul>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/dash">Dash on Github</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/">Agno Docs</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno Github</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/deploy/introduction">AgentOS Templates</a>
</li>
</ul>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Scaling Agentic Software]]></title>
            <link>https://ashpreetbedi.com/scaling-agentic-software-part-1</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/scaling-agentic-software-part-1</guid>
            <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<span class="text-xl font-semibold"><p><strong>What is the simplest architecture for running a multi-agent system at scale?</strong></p></span>
<p>I want to deploy agents as a real service. Multi-user, RBAC, JWT-based auth. Sessions, memory, and knowledge backed by a database. Horizontally Scalable. Able to serve thousands of concurrent requests. The kind of product you'd actually ship to users.</p>
<p>Could the answer be: <strong>a FastAPI app and a Postgres database?</strong></p>
<p>So I spent some time building one to find out. 14 agents, 11 multi-agent teams, 5 workflows. Hundreds of tools, approvals, evals, schedules. All running in a single FastAPI process against a single PostgreSQL database. It's open source: <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/demo-os">Demo AgentOS</a>.</p>
<p>I'll walk through the architecture in this post. In the next one we'll dive into what breaks when you push it.</p>
<h2>The Bar</h2>
<p>"Scale" gets thrown around quite a bit. In this case, scale means breadth. The surface area of a real product. Every concern a CTO would actually need to address before shipping a product to users:</p>
<p><strong>Multi-user and multi-tenancy.</strong> Every user gets their own sessions, memory, and context. The system isolates every resource an agent touches, across every user, on every request.</p>
<p>Note: Context bleeding is a data breach, not a bug.</p>
<p><strong>Auth and RBAC.</strong> JWT verification, role-based access control, scoped permissions. This applies to the API layer, the agents, the tools they call, and the data they can access. Dev and production should have different security postures.</p>
<p><strong>Real persistence.</strong> Sessions, memory, and knowledge stored in a database, with regular backups and data access policies. Everything needs to comply with user-data protection laws like GDPR and CCPA.</p>
<p><strong>Serving requests at scale.</strong> The system should be able to handle thousands of concurrent requests. Streaming responses should be held open. Background work (memory extraction, summarization, learning) running alongside the primary model call. All of it competing for the same HTTP transports, connection pools, and database connections. The hard part is not serving one request. It is serving the thousandth one without stalling the ninth one.</p>
<p><strong>Observability.</strong> Tracing every agent run, every tool call, every delegation in a multi-agent team. When something goes wrong at step 7 of a 12-step workflow, you need to see exactly what happened and why.</p>
<p><strong>Governance.</strong> Layered authority over what agents can do. Some tools run freely. Some need user approval. Some need admin sign-off. Approval flows, audit trails, and the ability to pause execution mid-run.</p>
<p><strong>Reliability and evals.</strong> Agents are testable software. You need smoke tests, tool call validation, LLM-judged accuracy, performance baselines. Without evals, every change is a guess.</p>
<p>If this is the bar, the question is: what's the simplest architecture that clears it?</p>
<h2>The Architecture</h2>
<p>One FastAPI process. One Postgres database. That's it.</p>
<p>The FastAPI app serves 14 agents, 11 multi-agent teams, 5 workflows using REST endpoints. Every request is a POST, every response is a server-sent event stream.</p>
<p>The database does more than you'd think. The Postgres database stores agent sessions, user memory, knowledge contents, learnings, schedules, and eval results. Pgvector handles embeddings for knowledge bases.</p>
<h2>The Components</h2>
<p>The 30+ components in the AgentOS showcase different agentic patterns.</p>
<img alt="Demo AgentOS" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fdemo-agentos-ui.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fdemo-agentos-ui.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fdemo-agentos-ui.png&amp;w=1920&amp;q=75">
<p>Some showcase <strong>HITL patterns</strong>. The Helpdesk agent wraps three tools: one that requires operator confirmation before restarting a service, one that pauses for user input on ticket priority, one that executes outside the agent runtime. The Approvals agent uses Agno's <code>@approval</code> decorator for blocking approval gates and audit-trailed operations. Both agents pause execution mid-run and resume on approval.</p>
<p>Some showcase <strong>guardrails</strong>. The Helpdesk agent has three pre-hooks: OpenAI moderation, PII detection, prompt injection detection. It also has a post-hook that scans responses for secret patterns (API keys, connection strings, SSNs) and rewrites them before they leave the process. An audit log hook records every run for compliance.</p>
<p>Some showcase <strong>multi-agent teams</strong>. Pal is a personal knowledge agent with five specialists. Dash is a data analyst with an Analyst/Engineer split. Coda is a coding agent with five specialists including a Planner and a Triager. The Research and Investment teams each ship in four modes (coordinate, route, broadcast, tasks) so you can see how the same set of members produces different behavior under different coordination patterns.</p>
<p>Some showcase <strong>step-based workflows</strong>. Morning Brief gathers calendar, email, and news in parallel and synthesizes a briefing. AI Research runs four parallel researchers and synthesizes their findings. Content Pipeline does parallel research plus a loop that iterates until an editor approves. Support Triage classifies a ticket, routes it to a specialist, and escalates if severity is high.</p>
<p>Some showcase <strong>state management</strong>. Taskboard demonstrates session state with agentic state updates. Injector demonstrates dependency injection through <code>RunContext</code>. Compressor demonstrates tool result compression with a cheaper model.</p>
<p>Some showcase <strong>scheduling</strong>. Morning Brief runs every weekday at 8am ET. AI Research runs every day at 7am UTC. The Scheduler agent lets users create, list, disable, and delete schedules at runtime through natural language.</p>
<p>The point is not that you need all of these. The point is that a single FastAPI process can host them without the architecture getting complicated.</p>
<h2>Governance as First-Class Infrastructure</h2>
<p>Three layers of governance sit on top of every agent.</p>
<p><strong>Pre-hooks</strong> run before the model sees the input. Moderation, PII detection, injection detection. If any hook raises, the request is rejected before a single token is generated.</p>
<p><strong>Approval gates</strong> pause the run mid-execution. A tool decorated with <code>requires_confirmation=True</code> or <code>@approval</code> streams a <code>RunPaused</code> event to the client with the tool name and arguments. The client shows the user an approve/reject UI. On approval, the run resumes from where it paused. This works because the session state is durable (stored in db).</p>
<p><strong>Post-hooks</strong> run on the output. The Helpdesk agent has an output guardrail that scans responses for secret patterns and rewrites them before they leave. Every run is audit-logged through a separate hook.</p>
<h2>What's Not Here</h2>
<p>No message queue. No worker pool. No separate vector database. No Redis. No microservices. No orchestrator service standing in front of the agents. No separate auth service.</p>
<p>Could you add them? Sure. Are they necessary to clear the bar I defined? Not yet. The point of this exercise is to find out where the simple architecture breaks, so the next decision (what to add) is grounded in actual load, not in speculation.</p>
<h2>What's Next</h2>
<p>Part 2 is what breaks when you scale this.</p>
<p>I'm going to load test it. Thousands of concurrent requests. Streaming responses held open. Background memory extraction competing with primary runs. Connection pools under pressure. I expect to find a few obvious bottlenecks and a couple of surprising ones.</p>
<p>Links:</p>
<ul>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/demo-os">Demo AgentOS on GitHub</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno GitHub</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/">Agno Docs</a>
</li>
</ul>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Dash: The Data Agent Every Company Needs]]></title>
            <link>https://ashpreetbedi.com/dash-v2</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/dash-v2</guid>
            <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Every company with 30+ people should have an internal data agent and today I'm making ours open-source: take <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/dash">Dash</a>, run it in your cloud, and give your team access via Slack.</p>
<p>Most AI-forward companies have in-house data agents:</p>
<ul>
<li>OpenAI: <a target="_blank" rel="noopener noreferrer" class="" href="https://openai.com/index/inside-our-in-house-data-agent/">Inside OpenAI's in-house data agent</a></li>
<li>Vercel: <a target="_blank" rel="noopener noreferrer" class="" href="https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools">d0</a>, <a target="_blank" rel="noopener noreferrer" class="" href="https://vercel.com/blog/anyone-can-build-agents-but-it-takes-a-platform-to-run-them">another post</a></li>
<li>Uber: <a target="_blank" rel="noopener noreferrer" class="" href="https://www.uber.com/gb/en/blog/query-gpt/">QueryGPT</a> (creative name)</li>
<li>LinkedIn: <a target="_blank" rel="noopener noreferrer" class="" href="https://www.linkedin.com/blog/engineering/ai/practical-text-to-sql-for-data-analytics">SQLBot</a> (absolutely LinkedIn-coded name for the agent)</li>
<li>Salesforce: <a target="_blank" rel="noopener noreferrer" class="" href="https://www.salesforce.com/blog/text-to-sql-agent/">Horizon Agent</a></li>
<li>DoorDash: <a target="_blank" rel="noopener noreferrer" class="" href="https://careersatdoordash.com/blog/beyond-single-agents-doordash-building-collaborative-ai-ecosystem/">How to use every buzzword in a blog post</a></li>
</ul>
<p>This post will show you how to build a best-in-class data system and make it available to your team over Slack. If you do this well, Dash should handle roughly 80% of routine data questions, send daily reports, and catch metric anomalies before anyone asks.</p>
<h2>What is Dash?</h2>
<p>Dash is a self-learning data system made of 3 agents: Dash (the team leader), a Data Analyst and a Data Engineer.</p>
<img alt="Dash AgentOS" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fdash-agentos.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fdash-agentos.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fdash-agentos.png&amp;w=1920&amp;q=75">
<p>It uses a dual-tier knowledge and learning system to deliver an incredible work-with-your-data experience.</p>
<p>You can chat with it via Slack or the AgentOS UI.</p>
<p>It writes SQL, runs it, and tells you what the numbers mean. More importantly, when it makes a mistake or gets corrected, it learns from it. When your team keeps asking the same question, it builds infrastructure so the answer is faster next time.</p>
<p><strong>A self-learning data system, not a data agent.</strong></p>
<p>Dash uses its own PostgreSQL database. You don't point it at your production database. You progressively load the tables you want it to work with, along with the context it needs to be useful. This is the part most people skip. This is the part that makes it special.</p>
<p>Here's how it looks in Slack (8x speedup when waiting):</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/dash-in-slack.mp4">Your browser does not support the video tag.</video>
<p>And on the AgentOS UI:</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/dash-agentos-ui.mp4">Your browser does not support the video tag.</video>
<p>Using the AgentOS UI, you can chat with your agents, view sessions, traces, metrics, and schedules.</p>
<p>AgentOS is the agent platform you didn't know you needed.</p>
<h2>How It Works</h2>
<h3>1. Context is everything</h3>
<p>Most data agents get a schema dump and the impossible task of writing SQL from business logic that only lives in the data engineer's head. That's why they're bad. Column names and types tell you nothing about the data. They don't tell you that <code>ended_at IS NULL</code> means a subscription is active. That annual billing gets a 10% discount. That usage metrics are sampled 3-5 days per month, so summing them gives you garbage.</p>
<p>I wrote about this problem in detail in my <a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/sql-agent">Self-Improving Text-to-SQL Agent</a> post. The core insight holds: <strong>the biggest improvement you can make to your data agent is giving it the same tribal knowledge that human engineers have.</strong></p>
<p>Dash uses a carefully curated knowledge system backed by PgVector. It contains:</p>
<p><strong>Table metadata.</strong> Table schema, column types, what they mean, what to use each table for, the gotchas. Every table ships with use cases and data quality notes. Example: status is 'active', 'churned', or 'trial'; always check against subscriptions for ground truth.</p>
<p><strong>Validated queries (must have).</strong> Battle-tested SQL with the right JOINs, the right NULL handling, the right edge cases. When the Analyst gets your question, it searches knowledge first. Before it writes a line of SQL, it already knows the shape of the data and which traps to avoid.</p>
<p><strong>Business rules.</strong> How MRR is calculated, what NRR means, that a customer can have multiple subscription records because upgrades close the old row and open a new one. This is the context that separates a correct answer from a plausible-looking wrong one.</p>
<blockquote>
<p>This knowledge is curated by the user. What makes Dash special is its ability to learn on its own.</p>
</blockquote>
<h3>2. Self-learning loop</h3>
<p>Separate from knowledge, Dash captures what it learns automatically (via tool calls). When the Analyst hits a type error and fixes it, the fix gets saved. When a user corrects a result, that correction is recorded. When the system discovers a data quirk, it notes it.</p>
<p>Next time anyone asks a similar question, the Analyst checks learnings before writing SQL. Dash gets better the more it's used.</p>
<p>I've been developing this pattern since December 2025, first as <a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/gpu-poor-continuous-learning">GPU Poor Continuous Learning</a> and then refined through <a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/dash">Dash v1</a>. The approach is simple: the model stays frozen. The system gets smarter. Learning happens in retrieval, not in weights. It's auditable, reversible, and requires zero training compute.</p>
<h3>3. Three agents, two schemas</h3>
<p>Dash is three agents. <strong>Leader</strong> routes requests and synthesizes answers. <strong>Analyst</strong> writes and runs SQL. <strong>Engineer</strong> builds views, summary tables, and computed data. They work together, sharing knowledge and learnings.</p>
<p><strong>The Leader</strong> has no SQL tools. It cannot touch the database.</p>
<p><strong>The Analyst</strong> is read-only. Not "read-only because the prompt says so." Read-only because the PostgreSQL connection is configured with <code>default_transaction_read_only=on</code>. The database itself rejects writes. No prompt injection or clever jailbreak changes this. The database says no.</p>
<p><strong>The Engineer</strong> can write, but only to the <code>dash</code> schema. A SQLAlchemy event listener intercepts every SQL statement before execution and blocks anything targeting the <code>public</code> schema. Your company data is untouchable.</p>
<p>This gives you two schemas with a hard boundary:</p>
<ul>
<li><strong>public schema:</strong> your company data. You load it. Agents read it.</li>
<li><strong>dash schema:</strong> views, summary tables, computed data. The Engineer owns and maintains it.</li>
</ul>
<p>There's also an <code>ai</code> schema where Dash stores its sessions, learnings, knowledge vectors, and other operational data. It powers the AgentOS UI and the self-improvement loop.</p>
<p>I covered the security model in depth in my <a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/systems-engineering">Systems Engineering</a> post. The key principle: security is a system property enforced by configuration, tested across layers.</p>
<h3>The part nobody else has</h3>
<p>When the Leader notices your team keeps asking the same expensive question (MRR by plan, churn by segment, revenue waterfall) it asks the Engineer to build a view.</p>
<p>The Engineer creates <code>dash.monthly_mrr_by_plan</code>. A SQL view joining the right tables, handling all edge cases, producing a clean result. Then it does the critical thing: it calls <code>update_knowledge</code> to record the view in the knowledge base. What it contains, what columns it has, example queries.</p>
<p>Next time someone asks about MRR by plan, the Analyst searches knowledge, finds the view, and queries it directly. No complex join. No risk of getting NULL handling wrong. Faster. Pre-validated. Consistent.</p>
<p>The agents build on each other's work. The Engineer creates infrastructure. The Analyst discovers and uses it. The Leader notices patterns and triggers the cycle. Over time, the <code>dash</code> schema fills with views and summary tables that nobody manually created. An analytics layer the system built for itself, shaped by what your team actually asks about.</p>
<h3>The full loop</h3>
<ol>
<li>You ask a question. Leader delegates.</li>
<li>The Analyst searches knowledge, writes correct SQL, returns an insight.</li>
<li>Good queries get saved to knowledge. Errors become learnings.</li>
<li>Repeated patterns become views. Views get recorded to knowledge.</li>
<li>Next time, the Analyst uses the view. Faster, pre-validated, consistent.</li>
</ol>
<p>Dash accumulates institutional knowledge about your data and compounds with use.</p>
<h2>Build Your Own</h2>
<p>Dash is free and open-source. Check out the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/dash">GitHub repo</a> and follow the README for in-depth instructions.</p>
<h3>Quick Start</h3>
<pre class="language-bash"><code class="language-bash"><span class="token function">git</span> clone https://github.com/agno-agi/dash <span class="token operator">&amp;&amp;</span> <span class="token builtin class-name">cd</span> dash
<span class="token function">cp</span> example.env .env  <span class="token comment"># Add OPENAI_API_KEY</span>

<span class="token function">docker</span> compose up -d --build

<span class="token function">docker</span> <span class="token builtin class-name">exec</span> -it dash-api python scripts/generate_data.py
<span class="token function">docker</span> <span class="token builtin class-name">exec</span> -it dash-api python scripts/load_knowledge.py
</code></pre>
<p>This starts Dash with a synthetic dataset (~900 customers, 6 tables) and loads the knowledge base (table metadata, validated queries, business rules). You can demo the entire system without connecting any real data.</p>
<h3>Connect to the Web UI</h3>
<ol>
<li>Open <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></li>
<li>Add OS → Local → <code>http://localhost:8000</code></li>
<li>Connect</li>
</ol>
<h2>Connect to Slack</h2>
<p>Dash lives in Slack. You can DM it or mention it in a channel with @Dash. Each thread maps to one session, so every conversation gets its own context.</p>
<ol>
<li>Run Dash and give it a public URL (use ngrok for local, or your deployed domain).</li>
<li>Follow instructions in <code>docs/SLACK_CONNECT</code> to create and install the Slack app from the manifest.</li>
<li>Set <code>SLACK_TOKEN</code> and <code>SLACK_SIGNING_SECRET</code>, then restart Dash.</li>
</ol>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/dash-in-slack.mp4">Your browser does not support the video tag.</video>
<h2>Adding Your Own Data</h2>
<p>Once you have Dash running, making it your own is straightforward. Replace the sample dataset with your data and give Dash the context it needs.</p>
<h3>1. Load your tables into the <code>public</code> schema</h3>
<p>Use whatever pipeline you already have. <code>pg_dump</code>, a Python script, dbt, Airbyte. Dash reads from <code>public</code> and never writes to it. You can use your existing workflow orchestration tools (Airflow, Dagster), or use Dash's built-in scheduler.</p>
<h3>2. Add table knowledge</h3>
<p>For each table, create a JSON file in <code>knowledge/tables/</code>:</p>
<pre class="language-json"><code class="language-json"><span class="token punctuation">{</span>
  <span class="token property">"table_name"</span><span class="token operator">:</span> <span class="token string">"customers"</span><span class="token punctuation">,</span>
  <span class="token property">"table_description"</span><span class="token operator">:</span> <span class="token string">"B2B SaaS customer accounts with company info and lifecycle status"</span><span class="token punctuation">,</span>
  <span class="token property">"use_cases"</span><span class="token operator">:</span> <span class="token punctuation">[</span><span class="token string">"Churn analysis"</span><span class="token punctuation">,</span> <span class="token string">"Cohort segmentation"</span><span class="token punctuation">,</span> <span class="token string">"Acquisition reporting"</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
  <span class="token property">"data_quality_notes"</span><span class="token operator">:</span> <span class="token punctuation">[</span>
    <span class="token string">"signup_date is DATE (not TIMESTAMP) — no time component"</span><span class="token punctuation">,</span>
    <span class="token string">"status values: active, churned, trial"</span><span class="token punctuation">,</span>
    <span class="token string">"company_size is self-reported"</span>
  <span class="token punctuation">]</span><span class="token punctuation">,</span>
  <span class="token property">"table_columns"</span><span class="token operator">:</span> <span class="token punctuation">[</span>
    <span class="token punctuation">{</span><span class="token property">"name"</span><span class="token operator">:</span> <span class="token string">"id"</span><span class="token punctuation">,</span> <span class="token property">"type"</span><span class="token operator">:</span> <span class="token string">"SERIAL"</span><span class="token punctuation">,</span> <span class="token property">"description"</span><span class="token operator">:</span> <span class="token string">"Primary key"</span><span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token punctuation">{</span><span class="token property">"name"</span><span class="token operator">:</span> <span class="token string">"company_name"</span><span class="token punctuation">,</span> <span class="token property">"type"</span><span class="token operator">:</span> <span class="token string">"TEXT"</span><span class="token punctuation">,</span> <span class="token property">"description"</span><span class="token operator">:</span> <span class="token string">"Company name"</span><span class="token punctuation">}</span><span class="token punctuation">,</span>
    <span class="token punctuation">{</span><span class="token property">"name"</span><span class="token operator">:</span> <span class="token string">"status"</span><span class="token punctuation">,</span> <span class="token property">"type"</span><span class="token operator">:</span> <span class="token string">"TEXT"</span><span class="token punctuation">,</span> <span class="token property">"description"</span><span class="token operator">:</span> <span class="token string">"Current status: active, churned, trial"</span><span class="token punctuation">}</span>
  <span class="token punctuation">]</span>
<span class="token punctuation">}</span>
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>This is the single highest-leverage thing you can do. The better your knowledge, the better Dash performs.</p></div></div></div></blockquote>
<h3>3. Add validated queries</h3>
<p>For your most common questions, write the SQL that gives the correct answer and save it in <code>knowledge/queries/</code>:</p>
<pre class="language-sql"><code class="language-sql"><span class="token comment">-- &lt;query current_mrr&gt;</span>
<span class="token comment">-- &lt;description&gt;Current total MRR from active subscriptions&lt;/description&gt;</span>
<span class="token comment">-- &lt;query&gt;</span>
<span class="token keyword">SELECT</span>
    <span class="token function">SUM</span><span class="token punctuation">(</span>mrr<span class="token punctuation">)</span> <span class="token keyword">AS</span> total_mrr<span class="token punctuation">,</span>
    <span class="token function">COUNT</span><span class="token punctuation">(</span><span class="token operator">*</span><span class="token punctuation">)</span> <span class="token keyword">AS</span> active_subscriptions
<span class="token keyword">FROM</span> subscriptions
<span class="token keyword">WHERE</span> <span class="token keyword">status</span> <span class="token operator">=</span> <span class="token string">'active'</span><span class="token punctuation">;</span>
<span class="token comment">-- &lt;/query&gt;</span>
</code></pre>
<p>This is the easiest way to make sure Dash uses your internal semantics for answering routine questions. Your job is to deliver the best work-with-your-data experience for your team. This makes it possible.</p>
<h3>4. Add business rules</h3>
<p>Document your metrics, definitions, and gotchas in <code>knowledge/business/</code>:</p>
<pre class="language-json"><code class="language-json"><span class="token punctuation">{</span>
  <span class="token property">"metrics"</span><span class="token operator">:</span> <span class="token punctuation">[</span>
    <span class="token punctuation">{</span>
      <span class="token property">"name"</span><span class="token operator">:</span> <span class="token string">"MRR"</span><span class="token punctuation">,</span>
      <span class="token property">"definition"</span><span class="token operator">:</span> <span class="token string">"Sum of active subscriptions excluding trials"</span><span class="token punctuation">,</span>
      <span class="token property">"calculation"</span><span class="token operator">:</span> <span class="token string">"SUM(mrr) FROM subscriptions WHERE status = 'active'"</span>
    <span class="token punctuation">}</span>
  <span class="token punctuation">]</span><span class="token punctuation">,</span>
  <span class="token property">"common_gotchas"</span><span class="token operator">:</span> <span class="token punctuation">[</span>
    <span class="token punctuation">{</span>
      <span class="token property">"issue"</span><span class="token operator">:</span> <span class="token string">"Active subscription detection"</span><span class="token punctuation">,</span>
      <span class="token property">"solution"</span><span class="token operator">:</span> <span class="token string">"Filter on ended_at IS NULL, not status column"</span>
    <span class="token punctuation">}</span>
  <span class="token punctuation">]</span>
<span class="token punctuation">}</span>
</code></pre>
<p>Helpful context for Dash. You can skip if it's too much work up front.</p>
<h3>5. Load knowledge</h3>
<pre class="language-bash"><code class="language-bash">python scripts/load_knowledge.py             <span class="token comment"># Upsert changes</span>
python scripts/load_knowledge.py --recreate  <span class="token comment"># Fresh start</span>
</code></pre>
<h2>Scheduled Tasks</h2>
<p>Dash ships with a built-in scheduler. You can schedule any type of task that your container can handle.</p>
<p>Out of the box, Dash comes with a pre-built schedule that re-indexes your knowledge base every night at 4am UTC:</p>
<pre class="language-python"><code class="language-python">mgr<span class="token punctuation">.</span>create<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"knowledge-refresh"</span><span class="token punctuation">,</span>
    cron<span class="token operator">=</span><span class="token string">"0 4 * * *"</span><span class="token punctuation">,</span>
    endpoint<span class="token operator">=</span><span class="token string">"/knowledge/reload"</span><span class="token punctuation">,</span>
    payload<span class="token operator">=</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">,</span>
    timezone<span class="token operator">=</span><span class="token string">"UTC"</span><span class="token punctuation">,</span>
    description<span class="token operator">=</span><span class="token string">"Daily knowledge file re-index"</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Same pattern for anything else: daily metric summaries posted to Slack, anomaly detection runs, weekly email digests, automated data quality checks. Register a schedule, point it at an endpoint, Dash handles the rest.</p>
<p>The best agents are proactive. Scheduled tasks are the first step in that direction.</p>
<h2>Run Evals</h2>
<p>Dash ships with five eval categories:</p>
<ul>
<li><strong>Accuracy:</strong> correct data and meaningful insights</li>
<li><strong>Routing:</strong> team routes to the correct agent</li>
<li><strong>Security:</strong> no credential or secret leaks</li>
<li><strong>Governance:</strong> refuses destructive SQL operations</li>
<li><strong>Boundaries:</strong> schema access boundaries respected</li>
</ul>
<pre class="language-bash"><code class="language-bash">python -m evals                      <span class="token comment"># Run all</span>
python -m evals --category accuracy  <span class="token comment"># Run one category</span>
python -m evals --verbose            <span class="token comment"># Show response details</span>
</code></pre>
<h2>Deploy to Production</h2>
<p>You can deploy Dash to Railway with one command:</p>
<pre class="language-bash"><code class="language-bash"><span class="token function">cp</span> example.env .env.production
<span class="token comment"># Edit .env.production — set OPENAI_API_KEY</span>

railway login
./scripts/railway_up.sh
</code></pre>
<p>Railway is fine for getting started. Eventually you'd want it wherever your existing data infrastructure lives. Everything is containerized so deployment should be straightforward. Be mindful of egress costs.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-sky-500 dark:bg-sky-400"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>Production requires a <code>JWT_VERIFICATION_KEY</code> from os.agno.com for RBAC. It would be insane to expose Dash on a public endpoint.</p></div></div></div></blockquote>
<h2>What's Next</h2>
<p>Dash is built with <a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/systems-engineering">systems engineering principles</a>. Five layers: agent, data, security, interface, infrastructure. Each layer affects the others. Design them together and the system compounds.</p>
<p>If there's interest, I'll do deep dives on each layer:</p>
<ul>
<li><strong>Agent Engineering:</strong> The business logic. Model, instructions, tools, knowledge, and the self-learning loop.</li>
<li><strong>Data Engineering:</strong> The context layer. Memory, knowledge, learnings, storage. Why the data layer is the most underinvested part of the stack.</li>
<li><strong>Security Engineering:</strong> Auth, RBAC, governance, data isolation, and audit trails designed into the system as core primitives.</li>
<li><strong>Interface Engineering:</strong> Turning an agent into a product. REST APIs, web UIs, Slack, MCP, and how one agent serves multiple surfaces.</li>
<li><strong>Infrastructure Engineering:</strong> How to deploy and scale Dash. Containers, deployment, scheduling.</li>
</ul>
<h2>TLDR</h2>
<p>Every company with 30+ people should have an internal data agent. Dash is a free, open-source, self-learning data system made of 3 agents. It uses curated knowledge and continuous learning to get better with every query. Three agents (Leader, Analyst, Engineer) share knowledge and build on each other's work. Security is enforced by the system: read-only connections, schema-level isolation, eval-tested boundaries. Runs in your cloud, lives in Slack. Clone it, run <code>docker compose up</code>, and have the entire system running in minutes.</p>
<hr>
<ul>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/dash">GitHub: agno-agi/dash</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://openai.com/index/inside-our-in-house-data-agent/">OpenAI's data agent</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/systems-engineering">Systems Engineering</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/gpu-poor-continuous-learning">GPU Poor Continuous Learning</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://www.ashpreetbedi.com/dash">Dash v1</a>
</li>
</ul>
<p>Built with <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a>.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[The Programming Language for Agentic Software]]></title>
            <link>https://ashpreetbedi.com/language-for-agents</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/language-for-agents</guid>
            <pubDate>Wed, 18 Feb 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Every era of computing develops its own programming language.</p>
<p>The mainframe era had COBOL and Fortran. The systems era had C. The web era had JavaScript and Python. Each emerged for the same reason, the previous generation could no longer express the new abstraction.</p>
<p>We are now in the agentic era.</p>
<p>Software is no longer just executing predefined instructions. It is reasoning over context, calling tools, retrieving knowledge, learning from past runs, and making decisions at runtime.</p>
<p>When the contract of software changes, the language must change too.</p>
<h2>What makes a programming language?</h2>
<p>A programming language is made of three things:</p>
<ol>
<li>Primitives to think and build with.</li>
<li>An engine to execute those primitives.</li>
<li>A runtime that governs memory, I/O, permissions, and interaction with the outside world.</li>
</ol>
<p>An SDK alone is not a programming language. A collection of utilities is not a programming language. Without an execution engine and a runtime that enforces behavior, you have a library, not a language.</p>
<p>Python gives you lists, functions, and classes. Its interpreter runs them. Its runtime manages memory, exceptions, and interfaces with the operating system.</p>
<p>React gives you components and state. Its reconciler computes updates. The browser handles rendering and events.</p>
<p>Applying this to agentic systems:</p>
<ul>
<li><a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a> gives you agents, teams, workflows, memory, knowledge, tools, guardrails, and approval flows.</li>
<li>The Engine runs them: model calls, tool execution, context construction, and iteration.</li>
<li>AgentOS, the production runtime, governs execution and interfaces with the outside world via an API: streaming, request-level isolation, authentication, RBAC, monitoring, background execution.</li>
</ul>
<p>The runtime is stateless. Sessions, memory, state and traces persist in your database. Permissions are enforced at request boundaries.</p>
<p>Agno provides the SDK + Engine + Runtime for agentic software.</p>
<h2>Agents are the new programs</h2>
<p>Traditional applications are collections of deterministic programs. Every path is written in advance. The system does exactly what the developer specified.</p>
<p>Agents change that.</p>
<p>An agent reasons over context. It chooses tools dynamically. It retrieves knowledge. It remembers previous runs. It decides which path to take at runtime.</p>
<p>This is still software, but the path between input and output is no longer fixed.</p>
<p>This does not mean deterministic systems disappear. For many workloads, static pipelines are faster, cheaper, and more reliable.</p>
<p>But when the system must pause, reason, retrieve, and adapt dynamically, predefined control flow breaks down.</p>
<p>For decades, the contract was simple:</p>
<blockquote>
<p>Same input, same output.</p>
</blockquote>
<p>Agentic software breaks that contract.</p>
<p>The same input can produce different outputs depending on memory, context, retrieval, and prior state. If execution is dynamic, the language must express that natively.</p>
<h2>Agentic software needs a new contract</h2>
<p>Agentic software requires new capabilities built into its programming language:</p>
<h3>1. A new interaction model</h3>
<p>Static software receives a request and returns a response.</p>
<p>Agentic software streams reasoning, tool calls, intermediate results, and pivots in real time. The execution path can change mid run, or pause for days. The system may retrieve knowledge halfway through and completely redirect its reasoning.</p>
<p>Streaming and iteration are the default and the language for agentic software must treat them as first class behavior.</p>
<h3>2. A new governance model</h3>
<p>Traditional systems execute predefined decisions within rules written in advance. Code does not decide whether to send an email or issue a refund. It simply follows instructions.</p>
<p>Agents make decisions, and not all decisions are equal.</p>
<p>Some actions are low risk: summarizing text or searching documentation.
<strong>Some require user approval</strong>: sending emails or booking travel.
<strong>Some require admin approval</strong>: issuing refunds, deleting records, changing permissions.</p>
<p>Without runtime-enforced approval boundaries, an agent that can draft an email can also execute a payment. The difference must be enforced by the runtime, not prompt engineering.</p>
<p>Governance must be part of the agent definition itself and the runtime must enforce it.</p>
<h3>3. A new trust model</h3>
<p>Static systems are trusted because every path is written in advance.</p>
<p>Agents introduce probabilistic reasoning into the execution path.</p>
<p>If guardrails and evaluation run outside the runtime, they are advisory rather than enforceable. Unsafe output can be produced before policy checks intervene.</p>
<p>Trust must therefore be part of the runtime semantics: guardrails, evaluation, logging, pre and post-response checks integrated into execution.</p>
<p>Interaction. Governance. Trust.</p>
<p>These are language-level concerns in the agentic era.</p>
<h2>What this looks like in practice</h2>
<p>Here is a lightweight coding agent that writes, reviews, and iterates on code. It remembers project conventions, retrieves knowledge, learns from past runs, and operates within explicit governance boundaries.</p>
<p>This example is intentionally minimal but production-capable. It has persistence, memory, learning, and controlled tool execution.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>learn <span class="token keyword">import</span> LearnedKnowledgeConfig<span class="token punctuation">,</span> LearningMachine<span class="token punctuation">,</span> LearningMode
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>openai <span class="token keyword">import</span> OpenAIResponses
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools<span class="token punctuation">.</span>coding <span class="token keyword">import</span> CodingTools
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools<span class="token punctuation">.</span>reasoning <span class="token keyword">import</span> ReasoningTools

gcode <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Gcode"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"agno.db"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    instructions<span class="token operator">=</span>instructions<span class="token punctuation">,</span>

    <span class="token comment"># Knowledge: searchable long-term memory</span>
    knowledge<span class="token operator">=</span>gcode_knowledge<span class="token punctuation">,</span>
    search_knowledge<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>

    <span class="token comment"># Learning: extract and store learnings over time</span>
    learning<span class="token operator">=</span>LearningMachine<span class="token punctuation">(</span>
        knowledge<span class="token operator">=</span>gcode_learnings<span class="token punctuation">,</span>
        learned_knowledge<span class="token operator">=</span>LearnedKnowledgeConfig<span class="token punctuation">(</span>mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>AGENTIC<span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">)</span><span class="token punctuation">,</span>

    <span class="token comment"># Tools: controlled extensions</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>CodingTools<span class="token punctuation">(</span>base_dir<span class="token operator">=</span>workspace<span class="token punctuation">,</span> <span class="token builtin">all</span><span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">,</span> ReasoningTools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>

    <span class="token comment"># Memory: learn user preferences</span>
    enable_agentic_memory<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>

    <span class="token comment"># Context: include prior runs</span>
    add_history_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    num_history_runs<span class="token operator">=</span><span class="token number">10</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Notice what is being defined:</p>
<ul>
<li>Knowledge as a first class primitive</li>
<li>Learning as a built in capability</li>
<li>Tools as controlled extensions</li>
<li>Memory and historical context as defaults</li>
<li>A runtime that governs how the system executes</li>
</ul>
<p>These are not utilities or third party integrations. They are the vocabulary of the agent and enforced by the runtime and execution layer.</p>
<p>That is what a programming language does. It gives you the right primitives for the era you are building in. You define the behavior. The language enforces it.</p>
<h2>Every era gets the language it needs</h2>
<p>COBOL abstracted business logic away from assembly. C abstracted system engineering without hiding it. Python abstracted memory management and low level primitives to accelerate iteration.</p>
<p>Each language captured the dominant abstraction of its era.</p>
<p>The agentic era introduces a new abstraction: systems that reason, remember, and decide at runtime.</p>
<p>The contract has changed.
The primitives have changed.
The execution model has changed.</p>
<p>The language must change too. That language is <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a>.</p>
<blockquote>
<p>There are many that argue that because Agno is written in Python, it cannot be a programming language.</p>
<p>If you wish to make an apple pie from scratch, you must first invent the universe.
— Carl Sagan</p>
</blockquote>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Build Your Own Multi-Agent System]]></title>
            <link>https://ashpreetbedi.com/multi-agent-system-railway</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/multi-agent-system-railway</guid>
            <pubDate>Thu, 29 Jan 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Instead of a hello world tutorial, let me show you how to build a live multi-agent system. We'll run it locally on Docker and deploy to production on <a target="_blank" rel="noopener noreferrer" class="" href="https://railway.com">Railway</a>.</p>
<p>This is a production-grade system that includes:</p>
<table><thead><tr><th>Feature</th><th>Description</th></tr></thead><tbody><tr><td><strong>Learning</strong></td><td>Agents remember and improve over time</td></tr><tr><td><strong>Persistence</strong></td><td>State, sessions, and memory backed by PostgreSQL</td></tr><tr><td><strong>Agentic RAG</strong></td><td>Knowledge retrieval that knows when and how to search</td></tr><tr><td><strong>MCP Tools</strong></td><td>Connect to external services via Model Context Protocol</td></tr><tr><td><strong>Monitoring</strong></td><td>Full visibility via the AgentOS control plane</td></tr></tbody></table>
<p>You'll also learn how to extend it with your own agents.</p>
<p>5 minute read. Running locally in 5. Deployed to production in 20.</p>
<h2>The Agents</h2>
<p>We'll build three agents, each demonstrating a different pattern:</p>
<ul>
<li><strong>Pal</strong> - AI-powered second brain. Captures notes, bookmarks, people, meetings. Researches the web. Learns over time.</li>
<li><strong>Knowledge Agent</strong> - Answers questions from a knowledge base.</li>
<li><strong>MCP Agent</strong> - Connects to external services via MCP.</li>
</ul>
<p>Each agent can be extended to fit your needs.</p>
<h2>Run Locally (5 minutes)</h2>
<h3>Prerequisites</h3>
<ul>
<li>Install <a target="_blank" rel="noopener noreferrer" class="" href="https://www.docker.com/products/docker-desktop">Docker Desktop</a></li>
<li>Get an <a target="_blank" rel="noopener noreferrer" class="" href="https://platform.openai.com/api-keys">OpenAI API key</a></li>
</ul>
<h3>Setup</h3>
<p>Clone the repo and export your OpenAI API key:</p>
<pre class="language-bash"><code class="language-bash"><span class="token function">git</span> clone <span class="token punctuation">\</span>
    https://github.com/agno-agi/agentos-railway-template.git <span class="token punctuation">\</span>
    agentos-railway

<span class="token builtin class-name">cd</span> agentos-railway

<span class="token builtin class-name">export</span> <span class="token assign-left variable">OPENAI_API_KEY</span><span class="token operator">=</span><span class="token string">"sk-***"</span>
</code></pre>
<p>Start the application (API + Database):</p>
<pre class="language-bash"><code class="language-bash"><span class="token function">docker</span> compose up -d --build
</code></pre>
<p>That's it. Your system is running. Here's how it looks:</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-local-setup.mp4">Your browser does not support the video tag.</video>
<h3>Connect to the UI</h3>
<ol>
<li>Open <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></li>
<li>Click <strong>Add OS</strong> → <strong>Local</strong></li>
<li>Enter <code>http://localhost:8000</code> as the URL</li>
</ol>
<p>Now chat with Pal:</p>
<pre class="language-shell"><code class="language-shell"><span class="token operator">&gt;</span> Note: decided to use Postgres <span class="token keyword">for</span> the new project - better JSON support

<span class="token operator">&gt;</span> Research event sourcing patterns and save the key findings

<span class="token operator">&gt;</span> What <span class="token keyword">do</span> I know about event sourcing?
</code></pre>
<h2>Deploy to Production (10 minutes)</h2>
<p>I've made it easy to deploy to Railway - just login and run a script.</p>
<h3>Prerequisites</h3>
<ul>
<li>Install the <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.railway.com/guides/cli">Railway CLI</a></li>
</ul>
<h3>Deploy</h3>
<p>Login to Railway and run the deploy script:</p>
<pre class="language-bash"><code class="language-bash">railway login

./scripts/railway_up.sh
</code></pre>
<p>The script provisions PostgreSQL, configures environment variables, and deploys your system. Give it a few minutes for the services to spin up.</p>
<h3>Connect to the UI</h3>
<ol>
<li>Open <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></li>
<li>Click <strong>Add OS</strong> → <strong>Live</strong></li>
<li>Enter your Railway domain</li>
</ol>
<p>You now have a production multi-agent system. Watch it go live in ~2 mins:</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-railway-deploy.mp4">Your browser does not support the video tag.</video>
<h2>What's Included</h2>
<h3>Pal (Personal Agent that Learns)</h3>
<p>Your AI-powered second brain. Captures notes, bookmarks, people, meetings. Researches the web and saves findings. Learns from errors so it doesn't repeat them.</p>
<p>I wrote more about Pal here: <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi/status/2016702682925334818">Building Pal: Personal Agent that Learns</a></p>
<h3>Knowledge Agent (Agentic RAG)</h3>
<p>Store any type of docs in a vector store, chat with it using Agentic RAG.</p>
<pre class="language-python"><code class="language-python">knowledge_agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    knowledge<span class="token operator">=</span>knowledge<span class="token punctuation">,</span>
    search_knowledge<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<h3>MCP Agent (MCP Tools)</h3>
<p>Connects to external tools via the Model Context Protocol. Point it at any MCP server and it gets access to those tools.</p>
<pre class="language-python"><code class="language-python">mcp_agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>MCPTools<span class="token punctuation">(</span>url<span class="token operator">=</span><span class="token string">"https://docs.agno.com/mcp"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<h2>Create Your Own Agent</h2>
<p>Now let's add a custom agent to the system. We'll build a research agent that uses the <a target="_blank" rel="noopener noreferrer" class="" href="https://exa.ai">Exa</a> MCP server.</p>
<p>Create <code>agents/research_agent.py</code>:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>openai <span class="token keyword">import</span> OpenAIResponses
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools<span class="token punctuation">.</span>mcp <span class="token keyword">import</span> MCPTools

<span class="token keyword">from</span> db <span class="token keyword">import</span> get_postgres_db

<span class="token comment"># Exa MCP for research</span>
EXA_MCP_URL <span class="token operator">=</span> <span class="token punctuation">(</span>
    <span class="token string-interpolation"><span class="token string">f"https://mcp.exa.ai/mcp?tools="</span></span>
    <span class="token string">"web_search_exa,company_research_exa,people_search_exa"</span>
<span class="token punctuation">)</span>

research_agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    <span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"research-agent"</span><span class="token punctuation">,</span>
    name<span class="token operator">=</span><span class="token string">"Research Agent"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>get_postgres_db<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>MCPTools<span class="token punctuation">(</span>url<span class="token operator">=</span>EXA_MCP_URL<span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    instructions<span class="token operator">=</span><span class="token triple-quoted-string string">"""\
You are a research agent. You help users find information about:
- Companies and startups
- People and their backgrounds
- Topics and trends

Be thorough but concise. Cite your sources.
"""</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Register it in <code>app/main.py</code>:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agents<span class="token punctuation">.</span>research_agent <span class="token keyword">import</span> research_agent

agent_os <span class="token operator">=</span> AgentOS<span class="token punctuation">(</span>
    agents<span class="token operator">=</span><span class="token punctuation">[</span>pal<span class="token punctuation">,</span> knowledge_agent<span class="token punctuation">,</span> mcp_agent<span class="token punctuation">,</span> research_agent<span class="token punctuation">]</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>Your agent is now part of the system. Chat with it:</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-research-agent.mp4">Your browser does not support the video tag.</video>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>If the agent doesn't show up, press refresh on the UI (top right corner) or restart containers with <code>docker compose restart</code>.</p></div></div></div></blockquote>
<h2>Wrapping Up</h2>
<p>You now have a live multi-agent system with:</p>
<table><thead><tr><th>Feature</th><th>Description</th></tr></thead><tbody><tr><td><strong>Learning</strong></td><td>Agents that remember and improve over time</td></tr><tr><td><strong>Persistence</strong></td><td>PostgreSQL for storing agent sessions, state, and memory</td></tr><tr><td><strong>Research</strong></td><td>Web search, company lookup, people search via Exa</td></tr><tr><td><strong>Monitoring</strong></td><td>Full visibility via the AgentOS control plane</td></tr><tr><td><strong>Extensibility</strong></td><td>Add agents, tools, and integrations as needed</td></tr></tbody></table>
<h2>What's Next</h2>
<ul>
<li><strong>Build more agents</strong> - Add specialized <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/agents">agents</a> for your use case</li>
<li><strong>Add tools</strong> - Extend your agents with <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/tools/toolkits">100+ toolkits</a></li>
<li><strong>Go multi-agent</strong> - Create multi-agent <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/teams">teams</a> and <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/workflows">workflows</a></li>
<li><strong>Go multi-channel</strong> - Expose your agents via Slack, Discord, WhatsApp</li>
<li><strong>Build an AI product</strong> - From 2-person startups to Fortune 500 companies, AgentOS is the foundation for agentic products</li>
</ul>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>The system is yours. You have a head start - make it count.</p></div></div></div></blockquote>
<hr>
<h2>Learn More</h2>
<ul>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agentos-railway-template">GitHub repo</a>
</li>
<li>
<a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com">Agno documentation</a>
</li>
</ul>
<p>Built with <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a>. Give it a ⭐️</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Evals Don't Give You a Working Product]]></title>
            <link>https://ashpreetbedi.com/evals-not-enough</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/evals-not-enough</guid>
            <pubDate>Sat, 10 Jan 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Evals are the holy grail of AI engineering. Or so we've been told.</p>
<p>Two years. Billions in VC funding. Thousands of blog posts about "production-ready agents." An entire industry built around evaluation frameworks, observability platforms, and benchmarks.</p>
<p>The result?</p>
<ul>
<li>
<p><strong>11% of organizations have agents in production.</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/agentic-ai-strategy.html">[Deloitte]</a></p>
</li>
<li>
<p><strong>40%+ of agentic AI projects will be cancelled by 2027.</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://www.gartner.com/en/newsroom/press-releases/2024-10-22-gartner-says-over-40-percent-of-agentic-ai-projects-will-be-abandoned-by-2027">[Gartner]</a></p>
</li>
<li>
<p><strong>80%+ never reach meaningful production.</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://www.rand.org/pubs/research_reports/RRA2680-1.html">[RAND]</a></p>
</li>
</ul>
<p>If evals were the answer, these numbers would be different.</p>
<p>Here's what I've learned after two years of shipping agents: <strong>passing evals ≠ working product.</strong> You can have a green test suite and a broken product. You can hit 95% on your benchmark and watch your agent choke the moment a real user touches it.</p>
<p>Evals don't get you to production. A working product does.</p>
<h2>The Pitch vs. The Reality</h2>
<p>Here's what the eval-industrial complex told us:</p>
<blockquote>
<p>"Evals are the key to production-ready agents" — <a target="_blank" rel="noopener noreferrer" class="" href="https://www.databricks.com/blog/key-production-ai-agents-evaluations">Databricks</a></p>
</blockquote>
<p>Here's what actually happens:</p>
<p>You build an agent in a Python script. It works. You run your eval suite. Green lights everywhere. You demo it to stakeholders. They love it. Then you try to ship it.</p>
<p>Everything falls apart.</p>
<h2>What Evals Don't Test</h2>
<p>Your eval suite said the agent was ready. Here's what it missed:</p>
<p><strong>Your agent isn't a function — it's a process.</strong> A single response might take 30 seconds. Or 3 minutes. Or 10 minutes if it's doing research. Traditional servers handle stateless request-response cycles in milliseconds. Your agent thinks, waits, calls tools, thinks again. Try fitting that into a Lambda with a 15-second timeout.</p>
<p><strong>State breaks at scale.</strong> Works great with 1 user on 1 container. Add more users? State bleeds across sessions. Add more containers? State disappears entirely. Store it in memory? Gone when the process dies. Store it in a database? Now you're building infrastructure you didn't plan for.</p>
<p><strong>Streaming is harder than it looks.</strong> In your notebook, responses just appeared. In production, users stare at a blank screen for 8 seconds wondering if the app crashed. You try SSE. Then WebSockets. Then you realize you need durable streams that survive network hiccups, handle backpressure, and resume gracefully after disconnects.</p>
<p><strong>The real world doesn't mock.</strong> Your agent calls an external API. In testing, mocks returned clean data every time. In production, the API times out. Returns malformed JSON. Hits rate limits. Requires re-authentication mid-session. Your agent chokes. Your eval suite never saw it coming.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>Agents fail because of an inadequate runtime, not intelligence. Evals don't measure any of it.</p></div></div></div></blockquote>
<p>We've been obsessing over the brain while ignoring the nervous system.</p>
<h2>The Trap: Evals Too Early</h2>
<p>Here's the thing that really kills projects: writing evals before you have a working product.</p>
<p>Every hour spent writing evals is an hour not spent learning what your product actually needs. You're locking yourself into test cases for a system that doesn't exist yet.</p>
<p>The agent you're building now? It's not the one that's going to ship. It's going to be the second iteration. Or the fifth. The eval suite you wrote for version one is useless for version three. Worse than useless — it's weight you're dragging around.</p>
<p>The eval-industrial complex sold you on this idea that evals-first is disciplined. It's not.</p>
<p>The right sequence:</p>
<ol>
<li>Build something that runs</li>
<li>Get it in front of real users (internal users are fine)</li>
<li>Learn what breaks, what matters, what "good" actually looks like</li>
<li><em>Then</em> write evals to lock in that understanding</li>
</ol>
<p>You can't evaluate what you can't run.</p>
<h2>What Evals Are Actually Good For</h2>
<p>I'm not saying evals are useless. They're critical — for model providers shipping foundation models. If you're training GPT-5, you need benchmarks. Even for AI engineers building products on top of those models, evals help with:</p>
<ul>
<li>Catching regressions after you change something</li>
<li>Comparing model versions</li>
<li>Compliance checkboxes</li>
</ul>
<p>That's it. They won't help you ship. They won't help you scale. They won't help you handle the thousand edge cases that only appear in production.</p>
<h2>What Actually Gets You to Production</h2>
<p>The market says: <strong>Evals → Observability → Production.</strong></p>
<p>This is backwards. Here's what actually works:</p>
<p><strong>Runtime → Production → (Evals + Observability)</strong></p>
<p>The foundation comes first. Everything else is a support layer.</p>
<p><strong>The foundation:</strong></p>
<ul>
<li>
<p><strong>A runtime that handles the weird stuff.</strong> Concurrent users. Failure recovery. Long-running stateful processes that survive container restarts. Your agent isn't a microservice — stop treating it like one.</p>
</li>
<li>
<p><strong>State management that doesn't disappear.</strong> Sessions that survive crashes. Context that carries across conversations. Memory that doesn't evaporate when Kubernetes decides to reschedule your pod.</p>
</li>
<li>
<p><strong>Storage that lives with the agent.</strong> The agent's data — sessions, memory, knowledge — stored where the agent runs. In your cloud. Under your control. Send it to a third-party service and you've lost control of your product's brain.</p>
</li>
<li>
<p><strong>Infrastructure you own.</strong> Your environment. Your data. Your competitive advantage.</p>
</li>
</ul>
<p><strong>The support layer (after you're running):</strong></p>
<ul>
<li><strong>Observability</strong> for real production behavior — not synthetic test traces.</li>
<li><strong>Evals</strong> to catch regressions — run them in CI, keep them lean.</li>
<li><strong>Tracing</strong> to debug when things go wrong.</li>
</ul>
<p>The support layer matters. But without the foundation, you're just testing in a notebook.</p>
<h2>The Questions That Actually Matter</h2>
<p>You have a working agent in a Python script. Great. Now answer these:</p>
<ul>
<li>Where will it run?</li>
<li>Can it handle 100 concurrent users? 1,000?</li>
<li>What happens when a container crashes mid-conversation?</li>
<li>Is streaming smooth or do users watch a loading spinner for 10 seconds?</li>
<li>Where does the agent's memory live? Who owns it?</li>
<li>How do you deploy updates without breaking active sessions?</li>
</ul>
<p>Evals don't answer any of these questions. The runtime does.</p>
<h2>The Path Forward</h2>
<p>I built <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a> because I got tired of watching good agents die in the gap between "works in a notebook" and "runs in production."</p>
<p>Agno is a runtime for agents. It handles the stuff evals can't test:</p>
<ul>
<li><strong>Concurrent execution</strong> — thousands of users, isolated state</li>
<li><strong>Persistent storage</strong> — sessions survive crashes, memory persists across conversations</li>
<li><strong>Streaming that works</strong> — SSE out of the box, handles disconnects gracefully</li>
<li><strong>Your infrastructure</strong> — runs in your cloud, data never leaves your environment</li>
</ul>
<p>The eval-industrial complex had their shot. Two years. Billions in funding. The production numbers haven't moved.</p>
<p>Maybe it's time to focus on actually shipping.</p>
<h2>Want to build with Agno?</h2>
<ul>
<li><strong>GitHub:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></li>
<li><strong>Documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></li>
<li><strong>AgentOS:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></li>
</ul>
<span class="text-teal-400">Production means a working product deployed to your cloud — not a green eval suite running on your laptop.</span>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Learning Machines: Technical Design]]></title>
            <link>https://ashpreetbedi.com/lm-technical-design</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/lm-technical-design</guid>
            <pubDate>Thu, 08 Jan 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p><strong>On Monday I introduced <a target="_blank" rel="noopener noreferrer" class="" href="/learning-machines-v0">Learning Machines</a> and yesterday I shared that it's finally working. Today I'll show you how it works under the hood.</strong></p>
<h2>First, Let's Recap</h2>
<p>After reading hundreds of papers on agentic memory and trying out every possible tool, I came to the simple conclusion that maybe we're looking at memory wrong.</p>
<p>Memory is just... learning. Learning about the user, the task at hand, learning insights and patterns, learning from decisions - good and bad, the feedback received. Learning from every interaction. Everything else is <strong>integration</strong> (how the agent uses these learnings) and <strong>curation</strong> (decay, pruning, deduplication).</p>
<p>So I built <strong>Learning Machines</strong>: A system that helps agents continuously learn from every interaction.</p>
<p>I started working on it dec 31, and got a basic working version yesterday. Here's the PR for those interested: <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/pull/5897">learning-machine-v0</a></p>
<p>Now let's dig into the technical details.</p>
<h2>The Learning Protocol</h2>
<p>The key behind it all is the <strong>Learning Protocol</strong>. It's a simple interface for building <strong>Learning Stores</strong> -- user profiles, session context, learned knowledge, entity memory, etc.</p>
<p>Let's take a look at the protocol:</p>
<pre class="language-python"><code class="language-python"><span class="token decorator annotation punctuation">@runtime_checkable</span>
<span class="token keyword">class</span> <span class="token class-name">LearningStore</span><span class="token punctuation">(</span>Protocol<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token triple-quoted-string string">"""Protocol that all learning stores must implement."""</span>

    <span class="token decorator annotation punctuation">@property</span>
    <span class="token keyword">def</span> <span class="token function">learning_type</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token builtin">str</span><span class="token punctuation">:</span>
        <span class="token triple-quoted-string string">"""Unique identifier (e.g., 'user_profile')."""</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

    <span class="token keyword">def</span> <span class="token function">recall</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token operator">**</span>context<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> Optional<span class="token punctuation">[</span>Any<span class="token punctuation">]</span><span class="token punctuation">:</span>
        <span class="token triple-quoted-string string">"""Retrieve learnings from storage."""</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

    <span class="token keyword">def</span> <span class="token function">process</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> messages<span class="token punctuation">:</span> List<span class="token punctuation">[</span>Any<span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token operator">**</span>context<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token boolean">None</span><span class="token punctuation">:</span>
        <span class="token triple-quoted-string string">"""Extract and save learnings from conversation."""</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

    <span class="token keyword">def</span> <span class="token function">build_context</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> data<span class="token punctuation">:</span> Any<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token builtin">str</span><span class="token punctuation">:</span>
        <span class="token triple-quoted-string string">"""Build context string for agent's system prompt."""</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

    <span class="token keyword">def</span> <span class="token function">get_tools</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token operator">**</span>context<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> List<span class="token punctuation">[</span>Callable<span class="token punctuation">]</span><span class="token punctuation">:</span>
        <span class="token triple-quoted-string string">"""Get tools to expose to agent."""</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
</code></pre>
<p>Five functions. Everything else is optional.</p>
<p><strong>Why this matters:</strong> You can build your own learning store in ~50 lines. Most memory systems are thousands of lines of config. This is ~50. Legal docs. Medical records. Codebases. Sales pipelines. Whatever your domain needs.</p>
<p>You can even build personalized LearningStores for your writing styles, for your daily to-do's, for your emails, for  your shopping lists. The real value of this approach is its extensibility.</p>
<h2>The Learning Machine</h2>
<p>The protocol lets you build stores. But stores need to plug into the agent somehow. That's what <strong>LearningMachine</strong> does.</p>
<pre><code>User Message ──────► Recall from Stores ◄────────┐
                            │                    │
                            ▼                    │
                      Build Context              │
                            │                    │
                            ▼                    │ LearningMachine
                Agent Responds (with tools)      │
                            │                    │
                            ▼                    │
                   Extract &amp; Process             │
                            │                    │
                            ▼                    │
              Update Stores (agent learns) ──────┴──► Periodic Curation
</code></pre>
<p>Recall → Build context → Run agent → Extract → Store. That's the loop.</p>
<h2>Developer Experience</h2>
<p>Three levels of complexity:</p>
<h3>Dead Simple</h3>
<pre class="language-python"><code class="language-python">agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>model<span class="token punctuation">,</span>
    db<span class="token operator">=</span>db<span class="token punctuation">,</span>
    learning<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>  <span class="token comment"># Enables user_profile in BACKGROUND mode</span>
<span class="token punctuation">)</span>
</code></pre>
<h3>Pick What You Want</h3>
<pre class="language-python"><code class="language-python">agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>model<span class="token punctuation">,</span>
    db<span class="token operator">=</span>db<span class="token punctuation">,</span>
    learning<span class="token operator">=</span>LearningMachine<span class="token punctuation">(</span>
        user_profile<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        session_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        learned_knowledge<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        entity_memory<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    <span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<h3>Full Control</h3>
<pre class="language-python"><code class="language-python">agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>model<span class="token punctuation">,</span>
    db<span class="token operator">=</span>db<span class="token punctuation">,</span>
    learning<span class="token operator">=</span>LearningMachine<span class="token punctuation">(</span>
        user_profile<span class="token operator">=</span>UserProfileConfig<span class="token punctuation">(</span>
            mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>AGENTIC<span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
        session_context<span class="token operator">=</span>SessionContextConfig<span class="token punctuation">(</span>
            enable_planning<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
        learned_knowledge<span class="token operator">=</span>LearnedKnowledgeConfig<span class="token punctuation">(</span>
            mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>PROPOSE<span class="token punctuation">,</span>
            namespace<span class="token operator">=</span><span class="token string">"engineering"</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
        entity_memory<span class="token operator">=</span>EntityMemoryConfig<span class="token punctuation">(</span>
            mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>BACKGROUND<span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<h2>Build Your Own Learning Store</h2>
<p>This is the win. Implement the protocol, plug it in:</p>
<pre class="language-python"><code class="language-python"><span class="token decorator annotation punctuation">@dataclass</span>
<span class="token keyword">class</span> <span class="token class-name">ProjectContextStore</span><span class="token punctuation">:</span>
    <span class="token triple-quoted-string string">"""Custom store for project-specific context."""</span>

    <span class="token decorator annotation punctuation">@property</span>
    <span class="token keyword">def</span> <span class="token function">learning_type</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token builtin">str</span><span class="token punctuation">:</span>
        <span class="token keyword">return</span> <span class="token string">"project_context"</span>

    <span class="token keyword">def</span> <span class="token function">recall</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> project_id<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> Optional<span class="token punctuation">[</span>ProjectContext<span class="token punctuation">]</span><span class="token punctuation">:</span>
        <span class="token comment"># Retrieve from your storage</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

    <span class="token keyword">def</span> <span class="token function">process</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> messages<span class="token punctuation">:</span> List<span class="token punctuation">[</span>Any<span class="token punctuation">]</span><span class="token punctuation">,</span> project_id<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token boolean">None</span><span class="token punctuation">:</span>
        <span class="token comment"># Extract and save</span>
        <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>

    <span class="token keyword">def</span> <span class="token function">build_context</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> data<span class="token punctuation">:</span> Any<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token builtin">str</span><span class="token punctuation">:</span>
        <span class="token keyword">if</span> <span class="token keyword">not</span> data<span class="token punctuation">:</span>
            <span class="token keyword">return</span> <span class="token string">""</span>
        <span class="token keyword">return</span> <span class="token string-interpolation"><span class="token string">f"&lt;project_context&gt;\n</span><span class="token interpolation"><span class="token punctuation">{</span>data<span class="token punctuation">.</span>summary<span class="token punctuation">}</span></span><span class="token string">\n&lt;/project_context&gt;"</span></span>

    <span class="token keyword">def</span> <span class="token function">get_tools</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> <span class="token operator">**</span>kwargs<span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> List<span class="token punctuation">[</span>Callable<span class="token punctuation">]</span><span class="token punctuation">:</span>
        <span class="token keyword">return</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>  <span class="token comment"># Or return tools for agentic mode</span>

<span class="token comment"># Plug it in</span>
learning <span class="token operator">=</span> LearningMachine<span class="token punctuation">(</span>
    custom_stores<span class="token operator">=</span><span class="token punctuation">{</span>
        <span class="token string">"project"</span><span class="token punctuation">:</span> ProjectContextStore<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p><strong>~50 lines. 5 functions. Your domain, your rules.</strong> Build a Learning Store for legal docs, medical records, codebases, sales pipelines. This is the whole point behind the Learning Machine.</p></div></div></div></blockquote>
<h2>Built-in Stores</h2>
<p>Phase 1 includes four stores:</p>
<table><thead><tr><th>Store</th><th>What It Captures</th><th>Scope</th><th>Storage</th></tr></thead><tbody><tr><td><strong>User Profile</strong></td><td>Name, work context, preferences, communication style</td><td>Per user (<code>user_id</code>)</td><td>Database (direct lookup)</td></tr><tr><td><strong>Session Context</strong></td><td>Summary of conversation, goal, plan steps, progress</td><td>Per session (<code>session_id</code>)</td><td>Database (direct lookup)</td></tr><tr><td><strong>Learned Knowledge</strong></td><td>Insights, patterns, best practices. Things that apply across users</td><td>Configurable namespace</td><td>Knowledge base (vector search)</td></tr><tr><td><strong>Entity Memory</strong></td><td>Facts, events, and relationships about external things — companies, people, projects</td><td>Configurable namespace</td><td>Database (direct lookup + search)</td></tr></tbody></table>
<h2>Key Design Decisions</h2>
<h3>Learning Modes</h3>
<p>Different use cases need different extraction modes.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">class</span> <span class="token class-name">LearningMode</span><span class="token punctuation">(</span>Enum<span class="token punctuation">)</span><span class="token punctuation">:</span>
    BACKGROUND <span class="token operator">=</span> <span class="token string">"background"</span>   <span class="token comment"># Automatic extraction after each conversation</span>
    AGENTIC <span class="token operator">=</span> <span class="token string">"agentic"</span>         <span class="token comment"># Agent decides via tools</span>
    PROPOSE <span class="token operator">=</span> <span class="token string">"propose"</span>         <span class="token comment"># Agent proposes, user confirms</span>
    HITL <span class="token operator">=</span> <span class="token string">"hitl"</span>               <span class="token comment"># Human-in-the-loop approval (future)</span>
</code></pre>
<p><strong>BACKGROUND</strong> is invisible. The user never sees extraction happening. This is what makes Claude's memory feel natural.</p>
<p><strong>AGENTIC</strong> gives control. The agent decides what's worth remembering. You can see the tool calls. Less noise, more transparency.</p>
<p><strong>PROPOSE</strong> is for medium-stakes learning. Agent suggests, human approves. Good for shared knowledge bases where bad data spreads.</p>
<p><strong>HITL</strong> is for the highest-stakes learning. Explicit human approval required.</p>
<h3>Namespace Scoping</h3>
<p>Some learnings should be private. Some should be shared. Namespaces enable this.</p>
<pre class="language-python"><code class="language-python"><span class="token comment"># Private to this user</span>
LearnedKnowledgeConfig<span class="token punctuation">(</span>namespace<span class="token operator">=</span><span class="token string">"user"</span><span class="token punctuation">)</span>

<span class="token comment"># Shared within engineering team</span>
LearnedKnowledgeConfig<span class="token punctuation">(</span>namespace<span class="token operator">=</span><span class="token string">"engineering"</span><span class="token punctuation">)</span>

<span class="token comment"># Shared with everyone</span>
LearnedKnowledgeConfig<span class="token punctuation">(</span>namespace<span class="token operator">=</span><span class="token string">"global"</span><span class="token punctuation">)</span>
</code></pre>
<p>This is what enables cross-user learning. This is what made yesterday's experiment work — Alice's insight helped Bob because they shared a namespace.</p>
<h3>Entity Memory: Three-Tier Memory System</h3>
<p>Entities (people, companies, projects) hold different types of information:</p>
<ul>
<li><strong>Facts</strong>: Semantic knowledge ("Uses PostgreSQL", "Based in London")</li>
<li><strong>Events</strong>: Episodic memories ("Launched v2 on Jan 15", "Raised Series A")</li>
<li><strong>Relationships</strong>: Graph connections ("Bob is CEO of Acme", "Acme acquired StartupX")</li>
</ul>
<p>Flat list doesn't work. You need to query "what do we know about Acme" differently than "what happened with Acme."</p>
<h2>What's Next</h2>
<table><thead><tr><th>Phase</th><th>What's Included</th><th>Status</th></tr></thead><tbody><tr><td><strong>Phase 1</strong></td><td>Learning Protocol, Learning Machine + 4 Learning Stores</td><td>Built, currently testing and fixing bugs</td></tr><tr><td><strong>Phase 2</strong></td><td>Decision Logs and Behavioral Feedback. Agents that know <em>why</em> they did what they did, and <em>what worked</em></td><td>Planned</td></tr><tr><td><strong>Phase 3</strong></td><td>Self-Improvement</td><td>Planned</td></tr></tbody></table>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p><strong>Phase 3 is the endgame.</strong> Agents that analyze their own failures and propose: "I should stop doing X." Human approves. Agent evolves. No retraining. No fine-tuning. Just learning.</p></div></div></div></blockquote>
<p>Want to dig in? Here's the PR: <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/pull/5897">learning-machine-v0</a></p>
<p>Memory was step one. Learning is what comes next.</p>
<p>If you enjoyed reading this, <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">checkout Agno on GitHub</a>.</p>
<p>Questions or feedback? Reach out on <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi">X</a>.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Learning Machines: Why AI Memory Hasn't Been Solved (Yet)]]></title>
            <link>https://ashpreetbedi.com/learning-machines-v0</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/learning-machines-v0</guid>
            <pubDate>Wed, 07 Jan 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p><strong>Every AI memory tool I've used is missing something.</strong></p>
<p>After reading hundreds (maybe thousands) of opinions, posts, and papers on agentic memory, I've come to three conclusions.</p>
<p><strong>1. No one has it figured out.</strong></p>
<p>Claude has the most impressive memory system I've seen. It feels natural. It never shouts. It knows what to reveal and when.</p>
<p>But we haven't figured out how to give developers the same capability for their own agents. The tools we have are... not there.</p>
<p><strong>2. Maybe we're looking at it wrong.</strong></p>
<p>Maybe memory is the wrong framing. <strong>What agents are really doing is learning.</strong> Learning about the user, the task at hand, learning insights and patterns, learning from decisions - good and bad, the feedback received. Learning from every interaction.</p>
<p>Everyone's rushing to build memory extraction systems — pull out facts, store them in a vector (or graph 🙄) database, retrieve them using complex mechanisms. But that's only half the problem.</p>
<p>But the hard part is integration: When does the learning happen? Before the response? After? In parallel? Is it automatic or does the agent control it? And critically — how do you teach the agent to use that information properly? Integration is what makes the system work.</p>
<p>You can't just tell an agent "you know XYZ about the user". You need to teach it how to use that knowledge. How to learn from it. How to prioritize it. How to act like a partner, a colleague, a companion who genuinely knows you — not a machine reciting facts from a database.</p>
<p><strong>3. User memory is only part of the story.</strong></p>
<p>User profiles and conversation summaries are just two types of learnings. But what about patterns and insights that worked? The entities involved - companies, people, projects? The decisions made and why? The feedback received? How should the agent use all these learnings to improve itself?</p>
<p>These aren't separate systems. They're all forms of learning.</p>
<hr>
<h2>Memory is Learning</h2>
<p>This realization led me to build something different: the Learning Machine, a unified learning system that helps agents continuously integrate information from their environment.</p>
<p>Here's the difference:</p>
<pre><code>Traditional "Memory":
Message → Extract → Store → Retrieve → Dump into Prompt → Repeat

Learning Machine:
User Message ──────► Recall from Stores ◄────────┐
                            │                    │
                            ▼                    │
                      Build Context              │
                            │                    │
                            ▼                    │ LearningMachine
                Agent Responds (with tools)      │
                            │                    │
                            ▼                    │
                   Extract &amp; Process             │
                            │                    │
                            ▼                    │
              Update Stores (agent learns) ──────┴──► Periodic Curation
</code></pre>
<p>The agent isn't just <strong>fed</strong> memories. It participates in learning, curating what it learns, and integrating that knowledge back into every response.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p><strong>The goal: an agent on interaction 1000 is fundamentally better than it was on interaction 1 — across the board, not just with the same user.</strong></p></div></div></div></blockquote>
<hr>
<h2>What It Looks Like in Action</h2>
<p>A new employee on their first day asks: "I'm starting work on the cloud migration project. What should I know?"</p>
<p>The agent responds with full context, even though it's never talked to this person before. It knows Acme is migrating from AWS to GCP. It knows Alex (CTO) is leading it. It knows Phase 2 is the most compute-heavy. It shares migration patterns from similar past projects. It knows that the pricing is changing next quarter.</p>
<p><strong>How?</strong> Three types of learning from past interactions:</p>
<pre><code>Session 1 (Alex, CTO):
"I'm Alex, CTO at Acme. We're migrating from AWS to GCP and
I need help planning the timeline."

→ User Profile captures: Alex, CTO, involved in planning discussions
→ Entity Memory captures: Acme (company), AWS→GCP migration (project)
→ Session Context: Goal is migration timeline planning
</code></pre>
<pre><code>Session 2 (next day, same user, different session):
"Just heard GCP is changing their pricing next quarter.
How does that affect our migration?"

→ Agent recalls: Acme, AWS→GCP migration, Alex is CTO, 3-phase timeline
→ Agent responds: "That could impact your timeline. Last time we mapped
   out a 3-phase approach with Phase 2 being the most compute-heavy.
   Want me to model the cost implications for each phase?"
</code></pre>
<pre><code>Session 3 (different user, same org namespace):
"I just joined to help with the Acme cloud project. What should I know?"

→ Entity Memory: "Acme is migrating AWS to GCP. Alex (CTO) is leading it."
→ Learned Knowledge: Shares migration patterns from past projects
→ Agent responds with full context — even though it never talked to this user
</code></pre>
<p>Three sessions. Three types of learning. Cross-user knowledge sharing.</p>
<p>This is possible. Today.</p>
<hr>
<h2>The Architecture: Learning Stores</h2>
<p>The key innovation behind the Learning Machine is the <strong>learning protocol</strong> and <strong>learning stores</strong>. The protocol defines how stores capture, process, and integrate knowledge. Each store is configured independently. Mix and match as needed. The Learning Machine orchestrates it all.</p>
<p>These are the stores I'm working on:</p>
<table><thead><tr><th>Store</th><th>What It Captures</th><th>Scope</th></tr></thead><tbody><tr><td><strong>User Profile</strong></td><td>Preferences, memories, personal context</td><td>Per user</td></tr><tr><td><strong>Session Context</strong></td><td>Goal, plan, progress, summary</td><td>Per session</td></tr><tr><td><strong>Entity Memory</strong></td><td>Facts, events, relationships about external things</td><td>Configurable</td></tr><tr><td><strong>Learned Knowledge</strong></td><td>Insights, patterns, best practices</td><td>Configurable</td></tr><tr><td><strong>Decision Logs</strong></td><td>Why decisions were made</td><td>Configurable</td></tr><tr><td><strong>Behavioral Feedback</strong></td><td>What worked, what didn't</td><td>Per agent</td></tr><tr><td><strong>Self-Improvement</strong></td><td>Evolved instructions</td><td>Per agent</td></tr></tbody></table>
<h3>Show Me Some Code</h3>
<p>One agent. Four learning stores. Configured independently. Orchestrated by the Learning Machine.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>postgres <span class="token keyword">import</span> PostgresDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>openai <span class="token keyword">import</span> OpenAIResponses

agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>OpenAIResponses<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>PostgresDb<span class="token punctuation">(</span>db_url<span class="token operator">=</span><span class="token string">"postgresql://..."</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    learning<span class="token operator">=</span>LearningMachine<span class="token punctuation">(</span>
        knowledge<span class="token operator">=</span>my_vector_store<span class="token punctuation">,</span>  <span class="token comment"># or graph if that's your thing</span>
        user_profile<span class="token operator">=</span>UserProfileConfig<span class="token punctuation">(</span>
            mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>BACKGROUND<span class="token punctuation">,</span>
            enable_agent_tools<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
        session_context<span class="token operator">=</span>SessionContextConfig<span class="token punctuation">(</span>
            enable_planning<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
        learned_knowledge<span class="token operator">=</span>LearnedKnowledgeConfig<span class="token punctuation">(</span>
            mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>PROPOSE<span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
        entity_memory<span class="token operator">=</span>EntityMemoryConfig<span class="token punctuation">(</span>
            mode<span class="token operator">=</span>LearningMode<span class="token punctuation">.</span>BACKGROUND<span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p><strong>The best part?</strong> You can build custom learning stores by extending the LearningStore protocol. Need project context? Build a <code>ProjectContextStore</code>. Need to track accounts? Build an <code>AccountStore</code>.</p></div></div></div></blockquote>
<hr>
<h2>Taking Inspiration from Claude</h2>
<p>Claude's memory feels magical. It's natural, contextual, never announces "saving to memory". It just <strong>knows</strong> you.</p>
<p>But here's the thing: <strong>you can't build with it.</strong> Claude's memory is a consumer product feature. The API gives you nothing. If you want learning for your agents, you're on your own. Enter Learning Machine.</p>
<p>Here's what Claude does well, and what Learning Machine adds:</p>
<p><strong>Claude feels natural.</strong> It never announces "saving to memory". So does Learning Machine. We inject context based on each store and control how the agent learns from it. No fact dumps.</p>
<p><strong>Claude learns about its users over time.</strong> Preferences, history, personal context. So does Learning Machine. But we also add sessions, entities, patterns, and decisions. The full picture, not just the user.</p>
<p><strong>Claude is scoped to a single user.</strong> Makes sense for a consumer product. Learning Machine adds namespace scoping: keep it private to a user, share across a team, or make it global. You control the boundaries.</p>
<p><strong>Claude has fixed memory types.</strong> You can't change how it works. Learning Machine is extensible via protocol. Build your own stores for whatever your domain needs.</p>
<p><strong>Claude is a closed system.</strong> Its memory lives inside Claude. Learning Machine is open source, fully customizable, and yours to extend.</p>
<p>I studied what makes Claude's memory feel good. Then built something you can actually use and extend.</p>
<h2>What This Unlocks</h2>
<p>Here's what's possible when agents learn across users, sessions, and time:</p>
<ul>
<li>A <strong>support agent</strong> where ticket #1,000 gets resolved better and faster — because it learned from tickets #1-999.</li>
<li>A <strong>customer success agent</strong> that remembers every account's stack, contracts, and conversations — across your entire team.</li>
<li>A <strong>healthcare agent</strong> that knows your full history — not just what's in today's chart, but every conversation (with different doctors), symptom, and concern you've ever mentioned.</li>
<li>A <strong>financial advisor</strong> that remembers your risk tolerance, goals, and every "what if" scenario you've ever explored — across years of conversations.</li>
<li>An <strong>agent that rewrites itself</strong> — analyzing its failures and proposing: "I should stop doing X."</li>
</ul>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>That last one is the endgame. Agents that learn from their own mistakes and rewrite their own instructions. Human approves. Agent evolves. Continuous improvement.</p></div></div></div></blockquote>
<hr>
<h2>Current Status</h2>
<p>Learning Machine is part of <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">Agno</a> and I'm in the final stages of testing Phase 1. Here's where things stand:</p>
<table><thead><tr><th>Phase</th><th>What's Included</th><th>Status</th></tr></thead><tbody><tr><td><strong>Phase 1</strong></td><td>User Profile, Session Context, Entity Memory, Learned Knowledge</td><td>Built, testing now</td></tr><tr><td><strong>Phase 2</strong></td><td>Decision Logs, Behavioral Feedback</td><td>Planned</td></tr><tr><td><strong>Phase 3</strong></td><td>Self-Improvement</td><td>Planned</td></tr></tbody></table>
<p>If you're eager to dig in, here's the PR: <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno/pull/5897">learning-machine-v0</a></p>
<p>Want to get involved? DM me if you're interested in learning more or helping out.</p>
<hr>
<p>Memory was never the goal. Learning was.</p>
<p>If you enjoyed reading this, <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">checkout Agno on GitHub</a>.</p>
<p>Questions or feedback? Reach out on <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi">X</a>.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Memory: How Agents Learn]]></title>
            <link>https://ashpreetbedi.com/memory</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/memory</guid>
            <pubDate>Mon, 22 Dec 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>It's almost 2026. Agents can follow complex instructions, use dozens of tools, and work autonomously for hours. But ask them the same question twice and they start from scratch. They don't remember what worked, what failed, or what they figured out along the way.</p>
<p><strong>What makes ChatGPT and Claude great personal assistants? Memory.</strong></p>
<p>Here's the dirty secret: when building agents with the API, we've made them capable, but we haven't yet figured out how to make them learn.</p>
<h2>Table of Contents</h2>
<ol>
<li>What is memory</li>
<li>How memory enables learning</li>
<li>Three patterns (with code)</li>
<li>Video demo</li>
<li>What makes a good learning</li>
<li>Get started</li>
</ol>
<blockquote>
<p>Wanna jump straight to the code? <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/getting-started">Here you go</a>. Cookbooks 2, 4 and 7 are what you're looking for.</p>
</blockquote>
<h2>1. What is memory?</h2>
<p>"Memory" gets thrown around loosely. Chat history? Context window? Vector database? Let's be precise.</p>
<p>There are three types of memory that matter for agents:</p>
<h3>Session Memory</h3>
<p>The conversation context. What was said five messages ago. This is a solved problem: store messages in a database, retrieve them before every response, add them to the context.</p>
<p>Session memory is useful but limited. It disappears when the conversation ends. It's not really memory, it's just context.</p>
<h3>User Memory</h3>
<p>Facts about a <strong>specific user</strong> that persist across sessions. Preferences, goals, constraints.</p>
<p>When a user says "I'm interested in AI stocks and have moderate risk tolerance", that's worth remembering, not just for this conversation, but for every future conversation with that user.</p>
<p>This is powerful, but it's still not learning. User memory is about <strong>recall</strong>, not <strong>improvement</strong>.</p>
<h3>Learned Memory</h3>
<p>This is where knowledge gets built. As agents interact with the world, they discover insights that apply <em>generally</em>, not just to one user, but to anyone asking similar questions.</p>
<p>When your finance agent discovers that "when comparing ETFs, check both expense ratio AND tracking error", this insight is worth saving, not just because one user asked, but because it makes the agent better at ETF comparisons for everyone.</p>
<p>Here's the beauty: <strong>knowledge compounds</strong>. The more the agent learns, the better it gets. And unlike weight updates, this knowledge is tangible: you can inspect it, edit it, delete it. No retraining required.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p><strong>If you're building agents without learned memory, you're leaving performance on the table.</strong></p></div></div></div></blockquote>
<h2>2. How memory enables learning</h2>
<p>Here's the core insight: <strong>learning is remembering what worked</strong>.</p>
<p>Without memory, agents are stateless. Every session is day one:</p>
<table><thead><tr><th>Without Memory</th><th>With Memory</th></tr></thead><tbody><tr><td>Re-discovers the same patterns</td><td>Searches prior learnings before acting</td></tr><tr><td>Repeats the same mistakes</td><td>Applies insights from past sessions</td></tr><tr><td>Re-asks the same questions</td><td>Builds domain knowledge over time</td></tr><tr><td>Can't build on prior success</td><td>Gets better the more you use it</td></tr></tbody></table>
<p>The best part: <strong>the model doesn't need to get better for the system to improve</strong>. Learning happens in retrieval, not in weights. And as models improve, your system improves too — for free.</p>
<p>I call this <strong>GPU Poor Continuous Learning</strong>: continuous improvement without fine-tuning, retraining, or any of the infrastructure traditionally required for model updates. Just a knowledge base that grows smarter over time.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>The model doesn't get smarter. The system gets smarter.</p></div></div></div></blockquote>
<h2>3. Three patterns for agent memory</h2>
<p>Let me show you how to implement the three patterns, with a bonus at the end.</p>
<h3>Pattern 1: Session Memory</h3>
<p>Store messages in a database, retrieve them before every response, add them to the context. Agno gives you this out of the box — just give your agent a database.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>google <span class="token keyword">import</span> Gemini
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools<span class="token punctuation">.</span>yfinance <span class="token keyword">import</span> YFinanceTools

agent_db <span class="token operator">=</span> SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/agents.db"</span><span class="token punctuation">)</span>

agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>Gemini<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gemini-3-flash-preview"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>YFinanceTools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>agent_db<span class="token punctuation">,</span>
    add_history_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    num_history_runs<span class="token operator">=</span><span class="token number">5</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">"__main__"</span><span class="token punctuation">:</span>
    session_id <span class="token operator">=</span> <span class="token string">"finance-session"</span>

    <span class="token comment"># Turn 1: Analyze a stock</span>
    agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span><span class="token string">"Quick investment brief on NVIDIA"</span><span class="token punctuation">,</span> session_id<span class="token operator">=</span>session_id<span class="token punctuation">)</span>

    <span class="token comment"># Turn 2: Agent remembers NVDA from turn 1</span>
    agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span><span class="token string">"Compare that to Tesla"</span><span class="token punctuation">,</span> session_id<span class="token operator">=</span>session_id<span class="token punctuation">)</span>

    <span class="token comment"># Turn 3: Recommendation based on full conversation</span>
    agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span><span class="token string">"Which looks like the better investment?"</span><span class="token punctuation">,</span> session_id<span class="token operator">=</span>session_id<span class="token punctuation">)</span>
</code></pre>
<p>Use a consistent <code>session_id</code> to persist conversation across runs.</p>
<h3>Pattern 2: User Memory</h3>
<p>Remember facts about the user across sessions. The <code>MemoryManager</code> extracts preferences automatically and stores them in the database.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>memory <span class="token keyword">import</span> MemoryManager
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>google <span class="token keyword">import</span> Gemini
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb

agent_db <span class="token operator">=</span> SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/agents.db"</span><span class="token punctuation">)</span>

memory_manager <span class="token operator">=</span> MemoryManager<span class="token punctuation">(</span>
    model<span class="token operator">=</span>Gemini<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gemini-3-flash-preview"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>agent_db<span class="token punctuation">,</span>
<span class="token punctuation">)</span>

agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>Gemini<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gemini-3-flash-preview"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    memory_manager<span class="token operator">=</span>memory_manager<span class="token punctuation">,</span>
    enable_user_memory<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token comment"># First conversation — preferences extracted and stored</span>
agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span>
    <span class="token string">"I'm interested in AI stocks. My risk tolerance is moderate."</span><span class="token punctuation">,</span>
    user_id<span class="token operator">=</span><span class="token string">"investor@example.com"</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token comment"># Later conversation — agent remembers</span>
agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span>
    <span class="token string">"What stocks would you recommend for me?"</span><span class="token punctuation">,</span>
    user_id<span class="token operator">=</span><span class="token string">"investor@example.com"</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p><code>enable_user_memory=True</code> runs the <code>MemoryManager</code> in parallel with every run. Use <code>enable_agentic_memory=True</code> to let the agent decide when to store memories via tool calls. More efficient, doesn't run on every response.</p>
<h3>Pattern 3: Learned Memory</h3>
<p>Now let's add learned memory: insights that apply beyond just one user. The key is a custom tool that saves learnings to a knowledge base:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">import</span> json
<span class="token keyword">from</span> datetime <span class="token keyword">import</span> datetime<span class="token punctuation">,</span> timezone

<span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>knowledge <span class="token keyword">import</span> Knowledge
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>google <span class="token keyword">import</span> Gemini
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>vectordb<span class="token punctuation">.</span>chroma <span class="token keyword">import</span> ChromaDb

agent_db <span class="token operator">=</span> SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/agents.db"</span><span class="token punctuation">)</span>

learnings_kb <span class="token operator">=</span> Knowledge<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Agent Learnings"</span><span class="token punctuation">,</span>
    vector_db<span class="token operator">=</span>ChromaDb<span class="token punctuation">(</span>
        name<span class="token operator">=</span><span class="token string">"learnings"</span><span class="token punctuation">,</span>
        persistent_client<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
        search_type<span class="token operator">=</span>SearchType<span class="token punctuation">.</span>hybrid<span class="token punctuation">,</span>
    <span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token keyword">def</span> <span class="token function">save_learning</span><span class="token punctuation">(</span>title<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> learning<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token builtin">str</span><span class="token punctuation">:</span>
    <span class="token triple-quoted-string string">"""
    Save a reusable insight to the knowledge base.

    Args:
        title: Short descriptive title
        learning: The insight — specific and actionable
    """</span>
    payload <span class="token operator">=</span> <span class="token punctuation">{</span>
        <span class="token string">"title"</span><span class="token punctuation">:</span> title<span class="token punctuation">.</span>strip<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
        <span class="token string">"learning"</span><span class="token punctuation">:</span> learning<span class="token punctuation">.</span>strip<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
        <span class="token string">"saved_at"</span><span class="token punctuation">:</span> datetime<span class="token punctuation">.</span>now<span class="token punctuation">(</span>timezone<span class="token punctuation">.</span>utc<span class="token punctuation">)</span><span class="token punctuation">.</span>isoformat<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span>

    learnings_kb<span class="token punctuation">.</span>add_content<span class="token punctuation">(</span>
        name<span class="token operator">=</span>payload<span class="token punctuation">[</span><span class="token string">"title"</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
        text_content<span class="token operator">=</span>json<span class="token punctuation">.</span>dumps<span class="token punctuation">(</span>payload<span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">)</span>

    <span class="token keyword">return</span> <span class="token string-interpolation"><span class="token string">f"Saved: '</span><span class="token interpolation"><span class="token punctuation">{</span>title<span class="token punctuation">}</span></span><span class="token string">'"</span></span>

agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>Gemini<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gemini-3-flash-preview"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>save_learning<span class="token punctuation">]</span><span class="token punctuation">,</span>
    knowledge<span class="token operator">=</span>learnings_kb<span class="token punctuation">,</span>
    search_knowledge<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>agent_db<span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>The agent now has two capabilities:</p>
<ol>
<li><strong>Search first</strong> — Before answering, it searches for relevant prior learnings</li>
<li><strong>Save learnings</strong> — When it discovers something reusable, it saves it</li>
</ol>
<p>But how do you prevent the agent from saving garbage?</p>
<h3>Bonus: Human-in-the-Loop Gating</h3>
<p>The quality of your knowledge base determines the quality of learning. Garbage in, garbage out.</p>
<p>The solution: the agent proposes learnings, but only saves with explicit user approval.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools <span class="token keyword">import</span> tool

<span class="token decorator annotation punctuation">@tool</span><span class="token punctuation">(</span>requires_confirmation<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
<span class="token keyword">def</span> <span class="token function">save_learning</span><span class="token punctuation">(</span>title<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">,</span> learning<span class="token punctuation">:</span> <span class="token builtin">str</span><span class="token punctuation">)</span> <span class="token operator">-</span><span class="token operator">&gt;</span> <span class="token builtin">str</span><span class="token punctuation">:</span>
    <span class="token triple-quoted-string string">"""Save a reusable insight. Requires user confirmation."""</span>
    <span class="token comment"># ... same implementation</span>
</code></pre>
<p>Handle the confirmation flow:</p>
<pre class="language-python"><code class="language-python">run_response <span class="token operator">=</span> agent<span class="token punctuation">.</span>run<span class="token punctuation">(</span><span class="token string">"Analyze NVDA and save any insights"</span><span class="token punctuation">)</span>

<span class="token keyword">for</span> requirement <span class="token keyword">in</span> run_response<span class="token punctuation">.</span>active_requirements<span class="token punctuation">:</span>
    <span class="token keyword">if</span> requirement<span class="token punctuation">.</span>needs_confirmation<span class="token punctuation">:</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Tool: </span><span class="token interpolation"><span class="token punctuation">{</span>requirement<span class="token punctuation">.</span>tool_execution<span class="token punctuation">.</span>tool_name<span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f"Args: </span><span class="token interpolation"><span class="token punctuation">{</span>requirement<span class="token punctuation">.</span>tool_execution<span class="token punctuation">.</span>tool_args<span class="token punctuation">}</span></span><span class="token string">"</span></span><span class="token punctuation">)</span>

        <span class="token keyword">if</span> user_approves<span class="token punctuation">:</span>
            requirement<span class="token punctuation">.</span>confirm<span class="token punctuation">(</span><span class="token punctuation">)</span>
        <span class="token keyword">else</span><span class="token punctuation">:</span>
            requirement<span class="token punctuation">.</span>reject<span class="token punctuation">(</span><span class="token punctuation">)</span>

run_response <span class="token operator">=</span> agent<span class="token punctuation">.</span>continue_run<span class="token punctuation">(</span>
    run_id<span class="token operator">=</span>run_response<span class="token punctuation">.</span>run_id<span class="token punctuation">,</span>
    requirements<span class="token operator">=</span>run_response<span class="token punctuation">.</span>requirements<span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<p>The agent proposes, the human gates. High-signal knowledge only.</p>
<h2>5. Video demo</h2>
<p>Here's a video demo that starts by showcasing user memory, then learned memory with user confirmation.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/memory-how-agents-learn.mp4">Your browser does not support the video tag.</video>
<h2>5. What makes a good learning</h2>
<p>A learning is worth saving if it's:</p>
<ul>
<li><strong>Specific</strong>: "Tech P/E ratios typically range 20-35x" not "P/E varies"</li>
<li><strong>Actionable</strong>: Can be applied to future queries</li>
<li><strong>Generalizable</strong>: Useful beyond this one conversation</li>
</ul>
<p>Don't save: raw data, one-off facts, summaries, speculation.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>Most queries should NOT produce a learning, and that's OK.</p></div></div></div></blockquote>
<h3>Where to store</h3>
<table><thead><tr><th>Memory Type</th><th>Key</th><th>Agno Component</th></tr></thead><tbody><tr><td>Session</td><td><code>session_id</code></td><td><code>SqliteDb</code>, <code>PostgresDb</code>, <code>MongoDB</code></td></tr><tr><td>User</td><td><code>user_id</code></td><td><code>MemoryManager</code> + Database</td></tr><tr><td>Learned</td><td><code>learning_id</code></td><td><code>Knowledge</code> + <code>ChromaDb</code>, <code>PgVector</code>, <code>Qdrant</code>, <code>Pinecone</code></td></tr></tbody></table>
<h3>Avoiding bloat</h3>
<p>The biggest mistake is storing too much. A bloated knowledge base hurts retrieval and makes the agent worse.</p>
<p>The upside: because learnings are stored explicitly (not in weights), they're auditable and reversible. Bad learning? Delete it. System immediately improves.</p>
<h2>6. Get started</h2>
<p>This blog comes with complete working code. Here are <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/getting-started">12 cookbooks</a> that take you from "what is an agent" to building agents with memory, knowledge, state, guardrails, and more. <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/getting-started">Link again for reference</a>.</p>
<table><thead><tr><th style="text-align:left">#</th><th style="text-align:left">Cookbook</th><th style="text-align:left">What You'll Learn</th></tr></thead><tbody><tr><td style="text-align:left">01</td><td style="text-align:left">Tools</td><td style="text-align:left">Give agents the ability to fetch real-time data</td></tr><tr><td style="text-align:left">02</td><td style="text-align:left">Storage</td><td style="text-align:left">Persist conversations across runs</td></tr><tr><td style="text-align:left">03</td><td style="text-align:left">Knowledge</td><td style="text-align:left">Load documents and search with hybrid retrieval</td></tr><tr><td style="text-align:left">04</td><td style="text-align:left">Custom Tools</td><td style="text-align:left">Write your own tools, add self-learning</td></tr><tr><td style="text-align:left">05</td><td style="text-align:left">Structured Output</td><td style="text-align:left">Return typed Pydantic objects</td></tr><tr><td style="text-align:left">06</td><td style="text-align:left">Typed I/O</td><td style="text-align:left">Full type safety on input and output</td></tr><tr><td style="text-align:left">07</td><td style="text-align:left">Memory</td><td style="text-align:left">Remember user preferences across sessions</td></tr><tr><td style="text-align:left">08</td><td style="text-align:left">State Management</td><td style="text-align:left">Track and persist structured state</td></tr><tr><td style="text-align:left">09</td><td style="text-align:left">Multi-Agent Teams</td><td style="text-align:left">Coordinate specialized agents</td></tr><tr><td style="text-align:left">10</td><td style="text-align:left">Workflows</td><td style="text-align:left">Sequential pipelines with predictable data flow</td></tr><tr><td style="text-align:left">11</td><td style="text-align:left">Guardrails</td><td style="text-align:left">Input validation, PII detection, prompt injection defense</td></tr><tr><td style="text-align:left">12</td><td style="text-align:left">Human in the Loop</td><td style="text-align:left">Require confirmation before sensitive actions</td></tr></tbody></table>
<p>Each builds on fundamentals, but you can jump to any one.</p>
<h3>Setup</h3>
<pre class="language-bash"><code class="language-bash"><span class="token function">git</span> clone https://github.com/agno-agi/agno.git
<span class="token builtin class-name">cd</span> agno

uv venv .getting-started --python <span class="token number">3.12</span>
<span class="token builtin class-name">source</span> .getting-started/bin/activate

uv pip <span class="token function">install</span> -r cookbook/00_getting_started/requirements.txt

<span class="token builtin class-name">export</span> <span class="token assign-left variable">GOOGLE_API_KEY</span><span class="token operator">=</span>your-google-api-key
</code></pre>
<h3>Run an example</h3>
<p>Each cookbook is self-contained:</p>
<pre class="language-bash"><code class="language-bash">python cookbook/00_getting_started/agent_with_tools.py
</code></pre>
<p>Want a visual interface? Agent OS gives you a web UI for chatting with agents, exploring sessions, and monitoring traces:</p>
<pre class="language-bash"><code class="language-bash">python cookbook/00_getting_started/run.py
</code></pre>
<p>Then visit <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a> and add <code>http://localhost:7777</code> as an endpoint.</p>
<h3>Swapping models</h3>
<p>These examples use Gemini 3 Flash by default — fast, reliable tool calling, cheap enough to experiment freely. But Agno is model-agnostic:</p>
<pre class="language-python"><code class="language-python"><span class="token comment"># Gemini (default)</span>
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>google <span class="token keyword">import</span> Gemini
model <span class="token operator">=</span> Gemini<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gemini-3-flash-preview"</span><span class="token punctuation">)</span>

<span class="token comment"># OpenAI</span>
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>openai <span class="token keyword">import</span> OpenAIChat
model <span class="token operator">=</span> OpenAIChat<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"gpt-5.2"</span><span class="token punctuation">)</span>

<span class="token comment"># Anthropic</span>
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>anthropic <span class="token keyword">import</span> Claude
model <span class="token operator">=</span> Claude<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span>
</code></pre>
<p>One line change. Everything else stays the same.</p>
<hr>
<p>If you enjoyed reading this, <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agno">star Agno on GitHub</a>. It helps more than you'd think. Questions or feedback? Reach out on <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi">X</a>.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[GPU Poor Continuous Learning with Gemini 3]]></title>
            <link>https://ashpreetbedi.com/gpu-poor-continuous-learning</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/gpu-poor-continuous-learning</guid>
            <pubDate>Thu, 18 Dec 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Here's a pattern I've been using to make my agents better without fine-tuning or retraining. We'll use a simple system-level learning loop that's surprisingly effective.</p>
<h2>Table of Contents</h2>
<ol>
<li>The problem with disconnected sessions</li>
<li>What is "gpu-poor continuous learning"</li>
<li>Why Gemini 3 Flash</li>
<li>The learning loop</li>
<li>Demo</li>
<li>What we store (and what we don't)</li>
<li>How to run your own Self-Learning Agent</li>
<li>Why this pattern works</li>
</ol>
<h2>1. The problem with disconnected sessions</h2>
<p>Most agents run in independent sessions, disconnected from each other.</p>
<p>You ask a question. You get an answer. Tomorrow you ask a similar question and the agent starts from scratch. It doesn't remember what worked, what failed, or what it figured out along the way.</p>
<p>This is fine for simple tasks. But for anything complex—research, analysis, decision support—it means:</p>
<ul>
<li>Repeating the same reasoning patterns</li>
<li>Re-discovering the same gotchas</li>
<li>Never building on past success</li>
</ul>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>If your agent can't learn from its own experience, you're leaving performance on the table.</p></div></div></div></blockquote>
<h2>2. What is GPU Poor Continuous Learning</h2>
<p>Let me be precise about terminology, because "continuous learning" has a specific meaning in ML.</p>
<p><strong>Traditional continuous learning:</strong></p>
<ul>
<li>Model weights update over time</li>
<li>Requires compute (GPUs, TPUs)</li>
<li>Risk of catastrophic forgetting</li>
<li>Learning happens in parameters</li>
</ul>
<p><strong>What I'm doing (GPU Poor Continuous Learning):</strong></p>
<ul>
<li>Model stays completely frozen</li>
<li>Zero training compute</li>
<li>Learning happens in retrieval</li>
<li>Knowledge is auditable and reversible</li>
</ul>
<p>The model doesn't get smarter. The <strong>system</strong> gets smarter.</p>
<p>I call it "GPU Poor" because you get continuous improvement without any of the infrastructure traditionally required for model updates. It's poor man's continuous learning—and it works surprisingly well.</p>
<h2>3. Why Gemini 3 Flash</h2>
<p>I built this with <a target="_blank" rel="noopener noreferrer" class="" href="https://blog.google/technology/developers/build-with-gemini-3-flash/">Gemini 3 Flash</a>, which launched today. Here's why:</p>
<table><thead><tr><th>Factor</th><th>Gemini 3 Flash</th></tr></thead><tbody><tr><td><strong>Cost</strong></td><td>$0.50/1M input, $3/1M output</td></tr><tr><td><strong>Speed</strong></td><td>3x faster than 2.5 Pro</td></tr><tr><td><strong>Context</strong></td><td>1M tokens input</td></tr><tr><td><strong>Agentic coding</strong></td><td>78% SWE-bench (beats Gemini 3 Pro)</td></tr><tr><td><strong>Context caching</strong></td><td>90% cost reduction for repeated tokens</td></tr></tbody></table>
<p>For a self-learning agent, you want:</p>
<ol>
<li><strong>Low cost</strong> — You're making many calls per session</li>
<li><strong>Fast inference</strong> — Tight feedback loops matter</li>
<li><strong>Large context</strong> — Prior learnings need room alongside new data</li>
<li><strong>Strong tool use</strong> — The agent needs to reliably call save/retrieve functions</li>
</ol>
<p>Gemini 3 Flash hits all four. The 1M context window is especially useful—you can include substantial prior learnings without truncating.</p>
<h2>4. The learning loop</h2>
<p>Here's the core pattern:</p>
<pre class="language-text"><code class="language-text">                         Query
                           │
                           ▼
                   Search learnings
                           │
                           ▼
                       Research
                           │
                           ▼
                      Synthesize
                           │
                           ▼
                        Reflect
                           │
              ┌────── reusable? ──────┐
              │                       │
             Yes                      No
              │                       │
              ▼                       │
        Propose to user               │
              │                       │
       ┌── approved? ──┐              │
       │               │              │
      Yes              No             │
       │               │              │
       ▼               │              │
     Save              │              │
       │               │              │
       └───────────────┴──────────────┘
                       │
                       ▼
                    Answer
</code></pre>
<p>Key details:</p>
<ol>
<li>
<p><strong>Search first</strong> — The agent must explicitly search the knowledge base before doing anything else. This isn't automatic; it's enforced through instructions.</p>
</li>
<li>
<p><strong>Most queries won't produce a learning</strong> — This is expected. Learnings should be rare and high-signal, not routine.</p>
</li>
<li>
<p><strong>Human-in-the-loop gating</strong> — The agent proposes learnings, but only saves them with explicit approval. If the user declines, the agent moves on without re-proposing.</p>
</li>
</ol>
<h2>5. Demo</h2>
<p>Here's a demo of the agent in action.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/gpu-poor-learning-agent.mp4">Your browser does not support the video tag.</video>
<h2>6. What we store (and what we don't)</h2>
<p>The biggest mistake is storing too much.</p>
<p>A learning is worth saving if it is:</p>
<ul>
<li><strong>Specific</strong>: "When comparing ETFs, check expense ratio AND tracking error" not "Look at ETF metrics"</li>
<li><strong>Actionable</strong>: Can be directly applied in future similar queries</li>
<li><strong>Generalizable</strong>: Useful beyond this specific question</li>
</ul>
<p>Do not save: raw facts, one-off answers, summaries, speculation, or anything unlikely to recur.</p>
<p>Each learning is structured:</p>
<pre class="language-python"><code class="language-python"><span class="token punctuation">{</span>
    <span class="token string">"title"</span><span class="token punctuation">:</span> <span class="token string">"ETF comparison checklist"</span><span class="token punctuation">,</span>
    <span class="token string">"context"</span><span class="token punctuation">:</span> <span class="token string">"When comparing similar ETFs for investment decisions"</span><span class="token punctuation">,</span>
    <span class="token string">"learning"</span><span class="token punctuation">:</span> <span class="token string">"Always check both expense ratio AND tracking error. Low expense ratio with high tracking error can cost more than a slightly more expensive fund with tight tracking."</span><span class="token punctuation">,</span>
    <span class="token string">"confidence"</span><span class="token punctuation">:</span> <span class="token string">"high"</span><span class="token punctuation">,</span>
    <span class="token string">"type"</span><span class="token punctuation">:</span> <span class="token string">"heuristic"</span><span class="token punctuation">,</span>
    <span class="token string">"created_at"</span><span class="token punctuation">:</span> <span class="token string">"2025-12-17T10:30:00Z"</span>
<span class="token punctuation">}</span>
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>Most tasks will not produce a learning. That's expected.</p></div></div></div></blockquote>
<h2>7. How to run your own Self-Learning Agent</h2>
<p>I'm providing cookbooks for running your own self-learning agent, built using:</p>
<ul>
<li>FastAPI application for running the agent</li>
<li>Postgres database for storing sessions, memory, and knowledge</li>
</ul>
<p>Here's the <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gemini-agents">link to the code</a>.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>You can wrap this up in a container and deploy it to Railway. Here's a sample <a target="_blank" rel="noopener noreferrer" class="text-teal-400 underline" href="https://github.com/agno-agi/agentos-railway">repository</a> you can use.</p></div></div></div></blockquote>
<span class="text-2xl font-semibold flex justify-center"><p><strong>Steps to run your own Self-Learning Agent</strong></p></span>
<h3>1. Clone the repo</h3>
<pre class="language-bash"><code class="language-bash"><span class="token function">git</span> clone https://github.com/agno-agi/agno.git
<span class="token builtin class-name">cd</span> agno
</code></pre>
<h3>2. Create and activate a virtual environment</h3>
<pre class="language-bash"><code class="language-bash">uv venv .gemini-agents --python <span class="token number">3.12</span>
<span class="token builtin class-name">source</span> .gemini-agents/bin/activate
</code></pre>
<h3>3. Install dependencies</h3>
<pre class="language-bash"><code class="language-bash">uv pip <span class="token function">install</span> -r cookbook/02_examples/04_gemini/requirements.txt
</code></pre>
<h3>4. Set environment variables</h3>
<pre class="language-bash"><code class="language-bash"><span class="token comment"># Required for Gemini models</span>
<span class="token builtin class-name">export</span> <span class="token assign-left variable">GOOGLE_API_KEY</span><span class="token operator">=</span>your-google-api-key

<span class="token comment"># Required for agents using parallel search</span>
<span class="token builtin class-name">export</span> <span class="token assign-left variable">PARALLEL_API_KEY</span><span class="token operator">=</span>your-parallel-api-key
</code></pre>
<h3>5. Run Postgres with PgVector</h3>
<p>Postgres stores agent sessions, memory, knowledge, and state. Install <a target="_blank" rel="noopener noreferrer" class="" href="https://www.docker.com/products/docker-desktop">Docker Desktop</a> and run:</p>
<pre class="language-bash"><code class="language-bash">./cookbook/scripts/run_pgvector.sh
</code></pre>
<p>Or run directly:</p>
<pre class="language-bash"><code class="language-bash"><span class="token function">docker</span> run -d <span class="token punctuation">\</span>
  -e <span class="token assign-left variable">POSTGRES_DB</span><span class="token operator">=</span>ai <span class="token punctuation">\</span>
  -e <span class="token assign-left variable">POSTGRES_USER</span><span class="token operator">=</span>ai <span class="token punctuation">\</span>
  -e <span class="token assign-left variable">POSTGRES_PASSWORD</span><span class="token operator">=</span>ai <span class="token punctuation">\</span>
  -e <span class="token assign-left variable">PGDATA</span><span class="token operator">=</span>/var/lib/postgresql <span class="token punctuation">\</span>
  -v pgvolume:/var/lib/postgresql <span class="token punctuation">\</span>
  -p <span class="token number">5532</span>:5432 <span class="token punctuation">\</span>
  --name pgvector <span class="token punctuation">\</span>
  agnohq/pgvector:18
</code></pre>
<h3>6. Run the Agent OS</h3>
<p>Agno provides a web interface for interacting with agents. Start the server:</p>
<pre class="language-bash"><code class="language-bash">python cookbook/02_examples/04_gemini/run.py
</code></pre>
<p>Then visit <a href="https://os.agno.com/?utm_source=github&amp;utm_medium=cookbook&amp;utm_campaign=gemini&amp;utm_content=cookbook-gemini-flash&amp;utm_term=gemini-flash">os.agno.com</a> and add <code>http://localhost:7777</code> as an endpoint.</p>
<h2>8. Why this pattern works</h2>
<p>This approach works because it separates concerns that are usually conflated:</p>
<table><thead><tr><th>Concern</th><th>Traditional</th><th>GPU Poor</th></tr></thead><tbody><tr><td><strong>Reasoning</strong></td><td>Model</td><td>Model (unchanged)</td></tr><tr><td><strong>Learning</strong></td><td>Model weights</td><td>Knowledge base</td></tr><tr><td><strong>Memory</strong></td><td>Context window</td><td>Persistent storage</td></tr></tbody></table>
<p>Benefits:</p>
<ul>
<li><strong>Auditable</strong> — You can see exactly what the agent "learned"</li>
<li><strong>Reversible</strong> — Delete a bad learning, system improves</li>
<li><strong>Fast feedback</strong> — No training cycles, immediate improvement</li>
<li><strong>No forgetting</strong> — New learnings don't overwrite capabilities</li>
</ul>
<p>The pattern generalizes beyond research. Use it for:</p>
<ul>
<li>Market analysis</li>
<li>Competitive intelligence</li>
<li>Technical support</li>
<li>Decision logging</li>
<li>Policy tracking</li>
</ul>
<p>Anywhere beliefs evolve, <strong>learnings beat stateless answers</strong>.</p>
<hr>
<p>Thank you for reading! Feel free to reach out on <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi">X</a> if you have questions or feedback.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Self Learning Research Agent That Tracks Consensus Over Time]]></title>
            <link>https://ashpreetbedi.com/self-learning-researcher</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/self-learning-researcher</guid>
            <pubDate>Tue, 16 Dec 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>In this post, we’ll build a <strong>self-learning research agent</strong> that does something more useful than one-off web searches. It captures the <em>current consensus</em>, compares it to past runs, explains what changed and why, and stores a clean snapshot so future runs get better.</p>
<p>No fine-tuning. No retraining. Just good system design.</p>
<h2>Table of Contents</h2>
<ol>
<li>Why research agents break down in practice</li>
<li>Research is about consensus, not answers</li>
<li>What is "self-learning"</li>
<li>Snapshot-based learning architecture</li>
<li>What we store in the knowledge base (and what we don’t)</li>
<li>End-to-end agent flow</li>
<li>Production Codebase (deployable anywhere)</li>
<li>Steps to run your own Self Learning Research Agent</li>
<li>Why this pattern works</li>
</ol>
<h2>1. Why research agents break down in practice</h2>
<p>Most research agents are <strong>stateless</strong>.</p>
<p>You ask a question today and get a well-written answer. You ask the same question tomorrow and get another well-written answer, but totally disconnected from the first one.</p>
<p>What's missing:</p>
<ul>
<li>No memory of prior conclusions</li>
<li>No notion of what changed</li>
<li>No way to tell if the answer is stabilizing or shifting</li>
</ul>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>Research without memory is just search with formatting.</p></div></div></div></blockquote>
<p>Humans don't work this way. We remember what we believed before and pay attention when new information contradicts it.</p>
<p>That's the missing layer.</p>
<h2>2. Research is about consensus, not answers</h2>
<p>A single answer is rarely the goal of research.</p>
<p>What we actually care about is:</p>
<ul>
<li>what most credible sources agree on</li>
<li>where there is disagreement</li>
<li>how confident we should be</li>
</ul>
<p>That's why our agent doesn't store prose. It stores <strong>structured consensus</strong>. Consensus is represented as a set of claims that are:</p>
<ul>
<li>short and explicit</li>
<li>backed by sources</li>
<li>labeled with confidence</li>
<li>stable enough to diff over time</li>
</ul>
<p>This structure is what makes comparison possible.</p>
<p>It also lays the foundation for reasoning about sources over time, including which sources tend to be reliable or volatile.</p>
<h2>3. What is "self-learning"</h2>
<p>Self-learning means the agent improves based on its own experience.</p>
<p>In this case, improvement comes from capturing <strong>snapshots of consensus over time</strong> and using those snapshots as context in future runs.</p>
<p>The agent does <strong>not</strong>:</p>
<ul>
<li>retrain models</li>
<li>update weights</li>
<li>fine-tune embeddings</li>
</ul>
<p>Instead, it learns by <strong>capturing experience as data</strong> and reusing it deliberately. This is what I refer to as <em>poor-man’s continuous learning</em>.</p>
<p>The model stays fixed. The system improves by accumulating validated snapshots of understanding.</p>
<h2>4. Snapshot-based learning architecture</h2>
<p>The system is built around a simple idea: <strong>append-only snapshots</strong>.</p>
<p>Each snapshot represents:</p>
<ul>
<li>the question that was asked</li>
<li>the internet's consensus at that moment</li>
<li>the claims that define that consensus</li>
<li>the sources used to support it</li>
<li>a short report summary for semantic retrieval</li>
</ul>
<p>Snapshots are never mutated. We only add new ones and compare.</p>
<p>Each stored snapshot includes:</p>
<ul>
<li><code>question</code></li>
<li><code>created_at</code></li>
<li><code>report_summary</code> (short, human-readable)</li>
<li><code>consensus_summary</code> (1–2 sentences)</li>
<li><code>claims</code> (structured and diffable)</li>
<li><code>sources</code></li>
<li>optional <code>notes</code></li>
</ul>
<p>This keeps the knowledge base compact, searchable, and stable over time.</p>
<h2>5. What we store in the knowledge base (and what we don’t)</h2>
<p>The biggest mistake we can make is storing too much.</p>
<p>We deliberately <strong>do not store</strong>:</p>
<ul>
<li>full markdown reports</li>
<li>raw scraped content</li>
<li>long explanations</li>
</ul>
<p>We <strong>do store</strong>:</p>
<ul>
<li>concise summaries</li>
<li>structured claims</li>
<li>deduplicated source lists</li>
</ul>
<p>Each claim looks like:</p>
<ul>
<li><code>claim_id</code> (stable slug)</li>
<li><code>claim</code> (short statement)</li>
<li><code>confidence</code> (Low | Medium | High)</li>
<li><code>source_urls</code></li>
</ul>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>If you can't diff it, you shouldn't store it.</p></div></div></div></blockquote>
<p>This keeps retrieval high-signal and comparisons reliable.</p>
<h2>6. End-to-end agent flow</h2>
<p>Here's what happens on every run:</p>
<ol>
<li>
<p><strong>Parallel research</strong>
The agent uses parallel search tools to gather information across multiple source types.</p>
</li>
<li>
<p><strong>Consensus extraction</strong>
Findings are synthesized into 4–10 structured claims with confidence and citations.</p>
</li>
<li>
<p><strong>Snapshot retrieval</strong>
The agent searches the knowledge base for the most recent snapshot of a similar question.</p>
</li>
<li>
<p><strong>Diff</strong>
Current claims are compared to the previous snapshot:</p>
<ul>
<li>new or strengthened claims</li>
<li>weakened or disputed claims</li>
<li>removed claims</li>
</ul>
<p>Each change includes a brief explanation and supporting sources.</p>
</li>
<li>
<p><strong>Human-in-the-loop save</strong>
The agent asks whether to save the new snapshot. Only explicit approval persists it.</p>
</li>
</ol>
<p>This keeps learning controlled, auditable, and intentional.</p>
<h2>7. Production Codebase (deployable anywhere)</h2>
<p>I'm providing a production codebase for running our self-learning research agent, built using:</p>
<ul>
<li>A FastAPI application for running our agents.</li>
<li>A Postgres database for storing sessions, memory and knowledge.</li>
</ul>
<p>Here's the link to the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agentos-railway">repository</a> containing the production codebase.</p>
<p>Here's the structure of the repository:</p>
<pre class="language-bash"><code class="language-bash"><span class="token builtin class-name">.</span>
├── agents
│&nbsp;&nbsp; ├── self_learning_research_agent.py
│&nbsp;&nbsp; └── <span class="token punctuation">..</span>. <span class="token function">more</span> agents
├── app
│&nbsp;&nbsp; └── main.py
├── compose.yaml
├── db
├── Dockerfile
├── pyproject.toml
├── railway.json
├── README.md
├── teams
│&nbsp;&nbsp; └── finance_team.py
└── workflows
    └── research_workflow.py
</code></pre>
<h2>8. Steps to run your own Self Learning Research Agent</h2>
<h3>Clone the repo</h3>
<pre class="language-shell"><code class="language-shell"><span class="token function">git</span> clone https://github.com/agno-agi/agentos-railway.git
<span class="token builtin class-name">cd</span> agentos-railway
</code></pre>
<h3>Configure API keys</h3>
<p>We'll use OpenAI for the agent and Parallel Search for search tools. Please export the following environment variables:</p>
<pre class="language-shell"><code class="language-shell"><span class="token builtin class-name">export</span> <span class="token assign-left variable">OPENAI_API_KEY</span><span class="token operator">=</span><span class="token string">"YOUR_API_KEY_HERE"</span>
<span class="token builtin class-name">export</span> <span class="token assign-left variable">PARALLEL_API_KEY</span><span class="token operator">=</span><span class="token string">"YOUR_API_KEY_HERE"</span>
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-sky-500 dark:bg-sky-400"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>You can copy the <code>example.env</code> file and rename it to <code>.env</code> to get started.</p></div></div></div></blockquote>
<h3>Install Docker</h3>
<p>We'll use docker to run the application locally and deploy it to Railway. Please install <a target="_blank" rel="noopener noreferrer" class="" href="https://www.docker.com/products/docker-desktop">Docker Desktop</a> if needed.</p>
<h3>Run the application locally</h3>
<p>Run the application using docker compose:</p>
<pre class="language-shell"><code class="language-shell"><span class="token function">docker</span> compose up --build -d
</code></pre>
<p>This command builds the Docker image and starts the application:</p>
<ul>
<li>The <strong>FastAPI application</strong>, running on <a target="_blank" rel="noopener noreferrer" class="" href="http://localhost:8000">localhost:8000</a>.</li>
<li>The <strong>PostgreSQL database</strong> for storing agent sessions, knowledge, and memories, accessible on <code>localhost:5432</code>.</li>
</ul>
<p>Once started, you can:</p>
<ul>
<li>View the FastAPI application at <a target="_blank" rel="noopener noreferrer" class="" href="http://localhost:8000/docs">localhost:8000/docs</a>.</li>
</ul>
<h3>Connect the AgentOS UI to the FastAPI application</h3>
<ul>
<li>Open the <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com/">AgentOS UI</a></li>
<li>Login and add <code>http://localhost:8000</code> as a new AgentOS. You can call it <code>Local AgentOS</code> (or any name you prefer).</li>
</ul>
<h3>Demo</h3>
<p>Here's a demo of the Self Learning Research Agent in action.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/self-learning-research-agent.mp4">Your browser does not support the video tag.</video>
<h3>Stop the application</h3>
<p>When you're done, stop the application using:</p>
<pre class="language-shell"><code class="language-shell"><span class="token function">docker</span> compose down
</code></pre>
<h3>Deploy the application to Railway</h3>
<p>To deploy the application to Railway, run the following commands:</p>
<ol>
<li>Install Railway CLI:</li>
</ol>
<pre class="language-shell"><code class="language-shell">brew <span class="token function">install</span> railway
</code></pre>
<ol start="2">
<li>Login to Railway:</li>
</ol>
<pre class="language-shell"><code class="language-shell">railway login
</code></pre>
<ol start="3">
<li>Deploy the application:</li>
</ol>
<pre class="language-shell"><code class="language-shell">./scripts/railway_up.sh
</code></pre>
<p>This command will:</p>
<ul>
<li>Create a new Railway project.</li>
<li>Deploy a PgVector database service to your Railway project.</li>
<li>Build and deploy the docker image to your Railway project.</li>
<li>Set environment variables in your AgentOS service.</li>
<li>Create a new domain for your AgentOS service.</li>
</ul>
<h2>9. Why this pattern works</h2>
<p>This approach generalizes far beyond traditional research, you can use it for:</p>
<ul>
<li>market analysis</li>
<li>policy tracking</li>
<li>competitive intelligence</li>
<li>technical standards</li>
<li>internal decision logs</li>
</ul>
<p>Anywhere beliefs evolve, <strong>snapshots beat stateless answers</strong>. By separating:</p>
<ul>
<li>online reasoning</li>
<li>from offline learning</li>
<li>and storing only what matters</li>
</ul>
<p>we get agents that feel more trustworthy, more explainable, and more useful over time.</p>
<hr>
<p>Thank you for reading! I hope you found this useful. Feel free to reach out to me on <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi">X</a> if you have any questions or feedback</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Self Improving Text2Sql Agent with Dynamic Context and Continuous Learning]]></title>
            <link>https://ashpreetbedi.com/sql-agent</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/sql-agent</guid>
            <pubDate>Mon, 15 Dec 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>This post shows how to build a self-improving Text-to-SQL agent using dynamic context and "poor-man's continuous learning". We'll break the problem into two parts:</p>
<ul>
<li><strong>Text-to-SQL Agent (Online Path):</strong> answers questions by retrieving schema + query patterns from a knowledge base (dynamic context).</li>
<li><strong>Continuous Learning (Offline Path):</strong> learns from successful runs and adds new entries to the knowledge base.</li>
</ul>
<p>When the Agent finds a successful result, it stores it in its knowledge base for future use. This gives the text-to-sql agent a self-improving feedback loop, but keeps the online path stable.</p>
<h2>Table of Contents</h2>
<ol>
<li>Why Text-to-SQL fails in practice</li>
<li>What is "dynamic context"</li>
<li>What is "poor man's continuous learning" (and why it works)</li>
<li>Unified Agent Architecture</li>
<li>Knowledge Base Design (keep it structured)</li>
<li>Production Harness (deployable anywhere)</li>
<li>Steps to run your own Text-to-SQL Agent</li>
</ol>
<h2>1. Why Text-to-SQL fails in practice</h2>
<p>Most Test-to-SQL agents fail in practice because they start from scratch every time, describing tables, columns, finding join keys. Repeating every mistake, every time.</p>
<p>Now compare this with how senior analysts or data engineers operate: do they start from scratch every time? No, they use tribal knowledge and experience and dig through past queries to find the right one. Once they find a useful query, they capture it in their knowledge base for future reference. Our text-to-sql agent works the same way.</p>
<p>I've found that most Text-to-SQL failures are not "model is dumb", they're "model is missing context and tribal knowledge" issues. Let's break down the common mistakes:</p>
<ul>
<li>The model starts from scratch every time, describing tables, columns, finding join keys. Repeating every mistake, every time.</li>
<li>The model guesses column names, usage patterns, or doesn't know the right join keys.</li>
<li>The model misses domain definitions (active user, churn, ARR, etc.) or doesn't know the right business rules (eg: "status lives in orders.state, not orders.status").</li>
<li>The model is missing common gotchas (date in the wrong format, nulls in the wrong place, etc.).</li>
<li>The model re-invents queries that already exist in your organization's knowledge base.</li>
</ul>
<p><strong>The biggest improvement you can make to your text-to-sql agent is to provide it with the same tribal knowledge that human engineers have. This enables them to re-use queries that we know work and let the model search established usage patterns at runtime.</strong> Call it RAG, Agentic RAG, or Dynamic Context, it's the same thing: the model, at runtime, has access to the right context to generate the right SQL.</p>
<p>Our goal is straightforward:</p>
<ol>
<li>Give our agent the tools to retrieve the <em>right</em> context at runtime (schemas, joins, past queries, metric definitions, gotchas).</li>
<li>Generate SQL grounded in well established usage patterns (no guessing and no re-inventing the wheel).</li>
<li>Validate the SQL (query is parseable, schema checks, etc.).</li>
<li>Run the SQL and "analyze" the results. Don't just give me the data, give me the insights.</li>
<li>Capture learnings so the next run is better (new join path, corrected column mapping, query template, metric definition).</li>
<li>Repeat.</li>
</ol>
<h2>2. What is "dynamic context"</h2>
<p>Dynamic context is simply: <strong>the agent retrieves the relevant knowledge at query time, which enables it to generate SQL grounded in well established usage patterns</strong>. The context is dynamic because it changes based on the query, the data, and the user's intent.</p>
<p>Examples of what the agent can retrieve:</p>
<ul>
<li>Table schemas and relationships</li>
<li>Common join keys and relationships</li>
<li>Known queries for common use cases</li>
<li>Metric definitions and business rules</li>
<li>Known gotchas ("status lives in orders.state, not orders.status")</li>
</ul>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>If your KB contains a query for "weekly active users", your agent should retrieve it, not re-invent it.</p></div></div></div></blockquote>
<h2>3. What is "poor man's continuous learning" (and why it works)</h2>
<p>By "poor man's continuous learning", I mean:</p>
<ul>
<li>We do <strong>not</strong> update model weights.</li>
<li>We do <strong>update retrieval knowledge</strong> when we find a successful result.</li>
<li>The system improves by capturing experience as reusable artifacts.</li>
</ul>
<blockquote>
<p>Every good query becomes future context.
Every mistake becomes a rule.
Every clarification becomes shared knowledge.</p>
</blockquote>
<p>Poor man's continuous learning works because it provides a pragmatic learning loop: stable online behavior, controlled improvements. The best part is that you can always explore the knowledge base manually and fix issues or mistakes, imaging updating model weights by hand.</p>
<h2>4. Unified Agent Architecture</h2>
<p>The systems is broken into 2 parts:</p>
<ol>
<li><strong>Text-to-SQL Agent:</strong> answers questions by retrieving schema + query patterns from a knowledge base (dynamic context).</li>
<li><strong>Continuous Learning:</strong> learns from successful runs and adds new entries to the knowledge base.</li>
</ol>
<h3>Query Flow</h3>
<ol>
<li><strong>User asks a question</strong></li>
<li>Agent <strong>retrieves context</strong> from KB (hybrid search) using:<!-- -->
<ul>
<li>question text</li>
<li>detected entities (tables, columns, metrics)</li>
<li>optional database introspection results</li>
</ul>
</li>
<li>This knowledge <strong>augments the input</strong> with dynamic context:<!-- -->
<ul>
<li>retrieved knowledge snippets</li>
<li>rules and constraints (read-only, limit, etc.)</li>
</ul>
</li>
<li>This knowledge <strong>guides the generation of SQL</strong>.</li>
<li>Agent <strong>executes the query</strong> in a safe environment.</li>
<li>Agent analyzes the results and <strong>returns the answer</strong>.</li>
<li>If the result is successful, the agent asks the user if they want to save the query to the knowledge base.</li>
<li>If the user agrees, the agent stores the query in the knowledge base.</li>
<li>If the user disagrees, the agent revists the query, update it and try again.</li>
</ol>
<p>There are 2 improvments you can make to the learning path:</p>
<ol>
<li>Run the continuous learning separately after every run of the text-to-sql agent. This way, the continuous learning is always up to date with the latest queries and results.</li>
<li>Add a regression harness to the continuous learning. This way, you can test the knowledge base before and after updates to ensure it's still working.</li>
</ol>
<h2>5. Knowledge Base Design (keep it structured)</h2>
<p>We want our knowledge base to store 3 kinds of information:</p>
<ol>
<li>Table information: this includes the table schema, column metadata, query rules , common gotchas (eg: date column contains a rule: "Use the <code>TO_DATE</code> function when filtering by date").</li>
<li>Sample queries: this include common query patterns and best practices. Along with how to retrieve common metrics and KPIs. There's no need to re-invent the wheel.</li>
<li>Business semantics and relationships: the layer that maps how your organization talks about data to how the database is structured.</li>
</ol>
<p>The sample codebase I'm providing contains the following files (table information and common queries):</p>
<pre class="language-shell"><code class="language-shell">agents/sql/knowledge/
├── constructors_championship.json
├── drivers_championship.json
├── fastest_laps.json
├── race_results.json
├── race_wins.json
└── common_queries.sql
</code></pre>
<h2>6. Production Harness</h2>
<p>I'm providing a production-ready harness for our system, built using:</p>
<ul>
<li>A FastAPI application for running our agents.</li>
<li>A Postgres database for storing sessions, memory and knowledge.</li>
</ul>
<p>Here's the link to the <a target="_blank" rel="noopener noreferrer" class="" href="https://github.com/agno-agi/agentos-railway">repository</a> containing the production codebase.</p>
<p>Here's the structure of the repository:</p>
<pre class="language-bash"><code class="language-bash"><span class="token builtin class-name">.</span>
├── agents
│&nbsp;&nbsp; ├── __init__.py
│&nbsp;&nbsp; ├── sql
│&nbsp;&nbsp; │&nbsp;&nbsp; ├── __init__.py
│&nbsp;&nbsp; │&nbsp;&nbsp; ├── knowledge
│&nbsp;&nbsp; │&nbsp;&nbsp; ├── load_f1_data.py
│&nbsp;&nbsp; │&nbsp;&nbsp; ├── load_sql_knowledge.py
│&nbsp;&nbsp; │&nbsp;&nbsp; ├── sql_agent.py
│&nbsp;&nbsp; │&nbsp;&nbsp; └── test_questions.txt
│&nbsp;&nbsp; └── <span class="token punctuation">..</span>. <span class="token function">more</span> agents
├── app
│&nbsp;&nbsp; ├── __init__.py
│&nbsp;&nbsp; └── main.py
├── compose.yaml
├── db
│&nbsp;&nbsp; └── <span class="token punctuation">..</span>. database configuration
├── Dockerfile
├── pyproject.toml
├── railway.json
├── README.md
├── requirements.txt
├── scripts
│&nbsp;&nbsp; ├── dev_setup.sh
│&nbsp;&nbsp; ├── entrypoint.sh
│&nbsp;&nbsp; ├── railway_up.sh
│&nbsp;&nbsp; ├── format.sh
│&nbsp;&nbsp; └── validate.sh
├── teams
│&nbsp;&nbsp; └── finance_team.py
└── workflows
    └── research_workflow.py
</code></pre>
<h2>7. Steps to run your own Text-to-SQL Agent</h2>
<h3>Clone the repo</h3>
<pre class="language-shell"><code class="language-shell"><span class="token function">git</span> clone https://github.com/agno-agi/agentos-railway.git
<span class="token builtin class-name">cd</span> agentos-railway
</code></pre>
<h3>Configure API keys</h3>
<p>We'll use OpenAI for the text-to-sql agent, (we also use Anthropic and Parallel Search for other agents in the service). Please export the following environment variables:</p>
<pre class="language-shell"><code class="language-shell"><span class="token comment"># Required</span>
<span class="token builtin class-name">export</span> <span class="token assign-left variable">OPENAI_API_KEY</span><span class="token operator">=</span><span class="token string">"YOUR_API_KEY_HERE"</span>

<span class="token comment"># Optional</span>
<span class="token builtin class-name">export</span> <span class="token assign-left variable">ANTHROPIC_API_KEY</span><span class="token operator">=</span><span class="token string">"YOUR_API_KEY_HERE"</span>
<span class="token builtin class-name">export</span> <span class="token assign-left variable">PARALLEL_API_KEY</span><span class="token operator">=</span><span class="token string">"YOUR_API_KEY_HERE"</span>
</code></pre>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-sky-500 dark:bg-sky-400"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p>You can copy the <code>example.env</code> file and rename it to <code>.env</code> to get started.</p></div></div></div></blockquote>
<h3>Install Docker</h3>
<p>We'll use docker to run the application locally and deploy it to Railway. Please install <a target="_blank" rel="noopener noreferrer" class="" href="https://www.docker.com/products/docker-desktop">Docker Desktop</a> if needed.</p>
<h3>Run the application locally</h3>
<p>Run the application using docker compose:</p>
<pre class="language-shell"><code class="language-shell"><span class="token function">docker</span> compose up --build -d
</code></pre>
<p>This command builds the Docker image and starts the application:</p>
<ul>
<li>The <strong>FastAPI application</strong>, running on <a target="_blank" rel="noopener noreferrer" class="" href="http://localhost:8000">localhost:8000</a>.</li>
<li>The <strong>PostgreSQL database</strong> for storing agent sessions, knowledge, and memories, accessible on <code>localhost:5432</code>.</li>
</ul>
<p>Once started, you can:</p>
<ul>
<li>View the FastAPI application at <a target="_blank" rel="noopener noreferrer" class="" href="http://localhost:8000/docs">localhost:8000/docs</a>.</li>
</ul>
<h3>Load data for the SQL Agent</h3>
<p>To load the data for the SQL Agent, run:</p>
<pre class="language-shell"><code class="language-shell"><span class="token function">docker</span> <span class="token builtin class-name">exec</span> -it agentos-railway-agent-os-1 python -m agents.sql.load_f1_data
</code></pre>
<p>To populate the knowledge base, run:</p>
<pre class="language-shell"><code class="language-shell"><span class="token function">docker</span> <span class="token builtin class-name">exec</span> -it agentos-railway-agent-os-1 python -m agents.sql.load_sql_knowledge
</code></pre>
<h3>Connect the AgentOS UI to the FastAPI application</h3>
<ul>
<li>Open the <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com/">AgentOS UI</a></li>
<li>Login and add <code>http://localhost:8000</code> as a new AgentOS. You can call it <code>Local AgentOS</code> (or any name you prefer).</li>
</ul>
<h3>Demo</h3>
<p>Here's a demo of the Text-to-SQL Agent in action. Notice how I add a query to the knowledge base and the agent uses it to generate the SQL when i ask the same question again.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/sql-agent-demo.mp4">Your browser does not support the video tag.</video>
<h3>Stop the application</h3>
<p>When you're done, stop the application using:</p>
<pre class="language-shell"><code class="language-shell"><span class="token function">docker</span> compose down
</code></pre>
<h3>Deploy the application to Railway</h3>
<p>To deploy the application to Railway, run the following commands:</p>
<ol>
<li>Install Railway CLI:</li>
</ol>
<pre class="language-shell"><code class="language-shell">brew <span class="token function">install</span> railway
</code></pre>
<ol start="2">
<li>Login to Railway:</li>
</ol>
<pre class="language-shell"><code class="language-shell">railway login
</code></pre>
<ol start="3">
<li>Deploy the application:</li>
</ol>
<pre class="language-shell"><code class="language-shell">./scripts/railway_up.sh
</code></pre>
<p>This command will:</p>
<ul>
<li>Create a new Railway project.</li>
<li>Deploy a PgVector database service to your Railway project.</li>
<li>Build and deploy the docker image to your Railway project.</li>
<li>Set environment variables in your AgentOS service.</li>
<li>Create a new domain for your AgentOS service.</li>
</ul>
<hr>
<p>Thank you for reading! I hope you found this useful. Feel free to reach out to me on <a target="_blank" rel="noopener noreferrer" class="" href="https://x.com/ashpreetbedi">X</a> if you have any questions or feedback</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Agent Security 101]]></title>
            <link>https://ashpreetbedi.com/agent-security</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/agent-security</guid>
            <pubDate>Tue, 28 Oct 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>PSA: If you're serious about Agent Security, stop sending your transactional data to telemetry services. <strong>Here's how to do it right:</strong></p>
<ol>
<li>Give your agents a database.</li>
<li>Store all transactions in that database.</li>
<li>Keep your data <strong>inside</strong> your system.</li>
<li>Avoid duplication across multiple systems.</li>
<li>Stop paying for egress and retention.</li>
</ol>
<h2>Transactional data ≠ Telemetry</h2>
<p>Somewhere along the way, people started treating conversational traces as logs (they're not), and started pushing <em>everything</em> (agent inputs, outputs, reasoning, memory) to telemetry vendors. It's not just bad security hygiene, it's inefficient, redundant, and expensive.</p>
<p><strong>Transactional data</strong> is what's happening in your system: inputs, outputs, tool calls, memory updates, and internal reasoning. It's the source of truth for your system and should never leave it.</p>
<p><strong>Telemetry data</strong> is system metrics and operational metadata (latency, token usage, error rates, throughput, uptime). That's the stuff you aggregate and throw in cold storage after 180 days.</p>
<p>In an agentic system, conversational traces are <strong>transactional data</strong>. They belong inside your infrastructure:</p>
<ol>
<li>They often contain <strong>PII, proprietary logic, and sensitive data</strong> and should never be sent externally.</li>
<li>They need to be <strong>re-used by your application</strong> (by future runs, for debugging and optimization), so you'll store them internally anyway.</li>
</ol>
<hr>
<span class="text-2xl font-semibold"><p>So how do you do it properly?</p></span>
<h2>1. Give your agents a database.</h2>
<p>Agents need structured storage. Sessions, runs, memory, knowledge — all of it should persist in your database. Just like any other application.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed">I personally use <strong>Postgres + PgVector</strong> in production, and <strong>Sqlite</strong> for demos.</div></div></div></blockquote>
<p>Here's a minimal example:</p>
<pre class="language-python"><code class="language-python"><span class="token comment"># /// script</span>
<span class="token comment"># dependencies = [</span>
<span class="token comment">#   "agno",</span>
<span class="token comment">#   "anthropic",</span>
<span class="token comment">#   "yfinance",</span>
<span class="token comment">#   "sqlalchemy",</span>
<span class="token comment"># ]</span>
<span class="token comment"># ///</span>

<span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>anthropic <span class="token keyword">import</span> Claude
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools<span class="token punctuation">.</span>yfinance <span class="token keyword">import</span> YFinanceTools

<span class="token comment"># ************* Create Agent *************</span>
agno_agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Finance Agent"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span>Claude<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/finance_agent.db"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>YFinanceTools<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    instructions<span class="token operator">=</span><span class="token string">"Use tables to display data."</span><span class="token punctuation">,</span>
    add_history_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    add_datetime_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    num_history_runs<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token comment"># ************* Run Agent *************</span>
agno_agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token operator">=</span><span class="token string">"What is the stock price of Apple?"</span><span class="token punctuation">,</span> stream<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span> stream_intermediate_steps<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
<span class="token comment"># Run #2 that continues the conversation</span>
agno_agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token operator">=</span><span class="token string">"Can you write a report on it? Just give me the report, no other text."</span><span class="token punctuation">,</span> stream<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span> stream_intermediate_steps<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
</code></pre>
<p>Save this to a file and run it with <code>uv run finance_agent.py</code>. You can see conversation history work flawlessly because it's stored in a local sqlite database.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/finance-agent.mp4">Your browser does not support the video tag.</video>
<h2>2. Store all transactions in that database.</h2>
<p>When you run your agents, store all transactions in that database. Including: inputs, outputs, context, messages, tool calls, memory updates, knowledge updates, culture updates. Basically everything that happens in your agentic system should be stored in your database.</p>
<p>For enterprise workloads, this isn't just best practice, it's a requirement. You need to persist traces for <strong>compliance, auditing, debugging, and continuity</strong>.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed">Agno does this automatically for you.</div></div></div></blockquote>
<p>External telemetry tools were never designed for this. They're built for metrics and logs, not for sensitive, replayable transactional data. You can make the case for running the data plane inside your VPC, you still have to deal with duplicated data (and pay enterprise data license costs).</p>
<h2>3. Keep data within your system (and avoid duplication).</h2>
<p>Every time you send LLM traces to an external service, you create redundant copies of sensitive data. This violates least-privilege principles and adds unnecessary complexity, you'll have to create "linking-ids" to connect your application usage to actual traces (solving problems that shouldn't exist in the first place).</p>
<p>Anyone who's built data pipelines knows: joining transactional data from app DBs with telemetry metrics is a nightmare. Skip the headache. Keep everything in one system.</p>
<h2>4. Want a UI? No problem.</h2>
<p>Once your data lives inside your infrastructure, it's easy to visualize. You could spin up a quick Streamlit dashboard or just use the <a target="_blank" rel="noopener noreferrer" class="" href="http://os.agno.com">AgentOS UI</a>, which gives you a ready-to-use view of all your agent sessions, runs, memory, knowledge, etc.</p>
<p>Here's how:</p>
<pre class="language-python"><code class="language-python"><span class="token comment"># /// script</span>
<span class="token comment"># dependencies = [</span>
<span class="token comment">#   "agno",</span>
<span class="token comment">#   "anthropic",</span>
<span class="token comment">#   "yfinance",</span>
<span class="token comment">#   "sqlalchemy",</span>
<span class="token comment">#   "fastapi[standard]",</span>
<span class="token comment">#   "mcp",</span>
<span class="token comment"># ]</span>
<span class="token comment"># ///</span>

<span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>anthropic <span class="token keyword">import</span> Claude
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>os <span class="token keyword">import</span> AgentOS
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>tools<span class="token punctuation">.</span>mcp <span class="token keyword">import</span> MCPTools

<span class="token comment"># ************* Create Agent *************</span>
agno_agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Agno Agent"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span>Claude<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/agno.db"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span>MCPTools<span class="token punctuation">(</span>transport<span class="token operator">=</span><span class="token string">"streamable-http"</span><span class="token punctuation">,</span> url<span class="token operator">=</span><span class="token string">"https://docs.agno.com/mcp"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    add_history_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    add_datetime_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    num_history_runs<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token comment"># ************* Create AgentOS *************</span>
agent_os <span class="token operator">=</span> AgentOS<span class="token punctuation">(</span>agents<span class="token operator">=</span><span class="token punctuation">[</span>agno_agent<span class="token punctuation">]</span><span class="token punctuation">)</span>
app <span class="token operator">=</span> agent_os<span class="token punctuation">.</span>get_app<span class="token punctuation">(</span><span class="token punctuation">)</span>

<span class="token comment"># ************* Run AgentOS *************</span>
<span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">"__main__"</span><span class="token punctuation">:</span>
    agent_os<span class="token punctuation">.</span>serve<span class="token punctuation">(</span>app<span class="token operator">=</span><span class="token string">"basic_demo:app"</span><span class="token punctuation">,</span> <span class="token builtin">reload</span><span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
</code></pre>
<p>Run this file using <code>uv run basic_agentos.py</code> and connect to it on the <a target="_blank" rel="noopener noreferrer" class="" href="http://os.agno.com">AgentOS UI</a>.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/basic-agentos.mp4">Your browser does not support the video tag.</video>
<h2>5. Finally, stop paying for egress and retention.</h2>
<p>Shipping full traces to third parties is expensive. Text is ok but when it comes to images, audio, video, files, etc., you're looking at a lot of bandwidth that is leaving your VPC. Egress fees, retention costs, and redundant storage add up — fast. Keeping data in your own infrastructure saves both <strong>money</strong> and <strong>risk</strong>.</p>
<span class="text-2xl font-semibold flex justify-center"><p><strong>Own your data, control your costs.</strong></p></span>
<h2>Why Agno?</h2>
<p>Agno was designed from the ground up for building private, secure, high-performance, agentic systems.</p>
<ol>
<li>Every Agent comes with its own database.</li>
<li>All data stays within your system.</li>
<li>Private. Secure. Open Source.</li>
</ol>
<p><strong>Agno documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></p>
<p><strong>Signup for the AgentOS:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></p>
<p><strong>Star Agno on Github:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></p>
<hr>
<p>I know mentioning Agno here seems like a plug, it's not. The architecture is simple: you should own your data. You don't have to use Agno for that. You can build it yourself. The difference is that with most telemetry providers, your data stays locked with them forever. With Agno, it stays with you.</p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[WTF are Agents?]]></title>
            <link>https://ashpreetbedi.com/wtf-is-an-agent</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/wtf-is-an-agent</guid>
            <pubDate>Fri, 24 Oct 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<h2>Most people overcomplicate Agents.</h2>
<p>Are they workflows? are they graphs? are they LLMs in a loop or just expensive while-loops? Are they deterministic, autonomous, or confused? Some say if you whisper "agent" three times, a VC appears with a term sheet.</p>
<p>How about we cut through the noise and understand what an Agent is by mapping out how they work. Let's demystify it — without the hype.</p>
<h2>What is an Agent?</h2>
<p>Regular programs execute a fixed set of instructions, written as code, in a predetermined order. If you write a program to add two numbers, that's exactly what it will do, every time. It won't add three, or four, or decide to do something else. The outcome is always the same because the logic is hardcoded.</p>
<p>Agents, on the other hand, are <span class="text-teal-400">AI programs where a language model decides the flow of execution</span>. You give it instructions, a set of tools, and the model decides what to do. If you give an Agent tools to add numbers, it can add two, three, or ten. If you also give it tools to subtract, multiply, and divide, it can perform any combination of operations — without you writing that logic explicitly.</p>
<p>If that explanation sounded abstract, that's because it is. Let's make sense of it by walking through what happens when you run an Agent:</p>
<ol>
<li>The Agent first builds the <strong>context</strong> for the model: system messages, user messages, adds chat history, memory, knowledge, state.</li>
<li>It sends that context to the model (the <strong>execution loop begins</strong>).</li>
<li>The model replies with a message, a <strong>tool call</strong>, or both.</li>
<li>If a tool is called, the Agent executes it and returns the results to the model. This is what I think makes a program "agentic".</li>
<li>The loop continues until the model produces a final message.</li>
<li>The Agent returns that response to the caller.</li>
</ol>
<p>That's it, this is an Agent. What'll be different is the context, the tools, and the model's reasoning, but the core remains the same.</p>
<blockquote>
<p>We're moving from deterministic execution to reasoning-based execution — from code that follows instructions to software that decides what to do. Will it do it well? We'll find out.</p>
</blockquote>
<h2>Minimal Example</h2>
<p>Let's build a simple agent to demo how it works, we'll add a few capabilities to make it more interesting:</p>
<ul>
<li>A database to store and maintain conversation history</li>
<li>Tools via MCP that it can call to answer questions</li>
<li>Respond in markdown so it looks pretty</li>
</ul>
<p>We'll also turn it into a FastAPI app so we can deploy it as a service. You can read the full instructions <a target="_blank" rel="noopener noreferrer" class="" href="https://docs.agno.com/introduction/quickstart">here</a>.</p>
<pre class="language-javascript"><code class="language-javascript"><span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">agent</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Agent</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">db</span><span class="token punctuation">.</span><span class="token property-access">sqlite</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">SqliteDb</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">models</span><span class="token punctuation">.</span><span class="token property-access">anthropic</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Claude</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">os</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">AgentOS</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">tools</span><span class="token punctuation">.</span><span class="token property-access">mcp</span> <span class="token keyword module">import</span> <span class="token maybe-class-name">MCPTools</span>

# <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span> <span class="token maybe-class-name">Create</span> <span class="token maybe-class-name">Agent</span> <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span>
agno_agent <span class="token operator">=</span> <span class="token function"><span class="token maybe-class-name">Agent</span></span><span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Agno Agent"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span><span class="token function"><span class="token maybe-class-name">Claude</span></span><span class="token punctuation">(</span>id<span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span><span class="token function"><span class="token maybe-class-name">SqliteDb</span></span><span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"agno.db"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span><span class="token function"><span class="token maybe-class-name">MCPTools</span></span><span class="token punctuation">(</span>url<span class="token operator">=</span><span class="token string">"https://docs.agno.com/mcp"</span><span class="token punctuation">,</span> transport<span class="token operator">=</span><span class="token string">"streamable-http"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    add_history_to_context<span class="token operator">=</span><span class="token maybe-class-name">True</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token maybe-class-name">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

# <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span> <span class="token maybe-class-name">Create</span> <span class="token maybe-class-name">AgentOS</span> <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span>
agent_os <span class="token operator">=</span> <span class="token function"><span class="token maybe-class-name">AgentOS</span></span><span class="token punctuation">(</span>agents<span class="token operator">=</span><span class="token punctuation">[</span>agno_agent<span class="token punctuation">]</span><span class="token punctuation">)</span>
app <span class="token operator">=</span> agent_os<span class="token punctuation">.</span><span class="token method function property-access">get_app</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
</code></pre>
<p>You can run this Agent using <code>fastapi dev agno_agent.py</code> and chat with it on the <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">AgentOS UI</a>. Here's how it looks:</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-chat.mp4">Your browser does not support the video tag.</video>
<p>Deploy your FastAPI app to your cloud of choice, and you're live!</p>
<h2>Are we done?</h2>
<p>Not even close. The hard part isn't building the Agent, its building the system that runs these Agents in production, and building a product around it with a great UX (or rather, <span class="text-teal-400">AX — Agent Experience</span>).</p>
<p>Ensuring reliability, durability, and a smooth experience across thousands of concurrent sessions is where the real engineering happens. These are long-running processes that demand isolated state management, persistent storage, and strong fault tolerance.</p>
<p>Here's what you'll need to consider when building Agents:</p>
<ol>
<li><strong>Runtime architecture</strong>: how agents are orchestrated, manage state, and handle execution loops.</li>
<li><strong>Memory systems</strong>: how agents retain and manage context, session history, memory, knowledge and culture.</li>
<li><strong>Tooling integration</strong>: how agents connect to APIs, databases, or internal functions (MCPs are popular here).</li>
<li><strong>Safety &amp; Security</strong>: how to ensure data, application and user-level security.</li>
<li><strong>Evaluation &amp; performance</strong>: measuring usefulness, latency, cost, and reliability of the agentic system.</li>
</ol>
<p>Each of these is a discipline of its own, with entire startups (sometimes dozens) dedicated to solving. But stitching it all together into a single, cohesive system is still a massive pain.</p>
<p>That's where Agno comes in.</p>
<h2>What is Agno?</h2>
<p><strong>Agno is a multi-agent framework, runtime, and control plane.</strong> It solves the 5 problems mentioned above via 3 tightly coupled components:</p>
<ol>
<li><strong>Framework for building Agents, Multi-Agent Teams and Workflows.</strong> It comes with an incredibly rich set of features like persistent storage, memory management, knowledge retrieval, 100+ toolkits, guardrails, dependency injection, dynamic context management, human in the loop, and much, much more.</li>
<li><strong>Pre-built FastAPI Runtime for deploying multi-agent systems.</strong> This runtime, called AgentOS, exposes pre-built endpoints you can build your product on top of. It handles concurrency, state management, and error recovery out of the box — plus extras like initializing MCP connections via lifecycle hooks and securing every request with a security-key.</li>
<li><strong>Control Plane for testing, monitoring, debugging and evaluating multi-agent systems.</strong> This is a web interface that allows you to manage your multi-agent systems in real-time. It's a powerful tool that helps you understand what your agents are doing, and why.</li>
</ol>
<p>If you're building Agents, give Agno a try:</p>
<ul>
<li><strong>GitHub:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></li>
<li><strong>Documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></li>
<li><strong>Website:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://www.agno.com">agno.com</a></li>
</ul>
<hr>
<p>Agents aren't magic. They're just a new kind of software. Once you understand that, everything else falls into place.</p>
<span class="text-teal-400">Agent Engineering is just Software Engineering</span>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Agent Engineering 101]]></title>
            <link>https://ashpreetbedi.com/agent-engineering</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/agent-engineering</guid>
            <pubDate>Thu, 23 Oct 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<span class="text-2xl font-semibold"><p>✨ The intersection of software, systems and security engineering.</p></span>
<p>For a moment, stop debating what an Agent should be — deterministic or autonomous, a workflow or graph. Just pause for a sec and step back.</p>
<p>Our goal is to make use of this technology, which in my opinion, lends itself to 3 major use-cases:</p>
<ol>
<li><strong>Tools</strong> that improve productivity (chatgpt, claude, cursor).</li>
<li><strong>Workflows</strong> that saves time (marketing research, report generation).</li>
<li><strong>AI products</strong> that solve user problems (eg: Notion AI).</li>
</ol>
<p>You can buy AI tools, and tools for building workflows, but building AI products is where the real engineering happens. Let's dive in.</p>
<h2>What is Agent Engineering?</h2>
<p>Agent Engineering is the practice of <strong>building</strong>, <strong>running</strong> and <strong>managing</strong> agentic systems. It sits at the intersection of software engineering, system design and security engineering.</p>
<p>In practice, if you're building an AI product, you'll need an AI backend — a system that connects to your frontend via an API. This backend is responsible for running agents (concurrently), managing memory, knowledge, state, and ensuring the security and privacy of your environment. This is Agent Engineering, which focuses on:</p>
<ul>
<li><strong>Runtime architecture</strong>: how agents are orchestrated, manage state, and handle execution loops.</li>
<li><strong>Memory systems</strong>: how agents retain and manage context, session history, memory, knowledge and culture.</li>
<li><strong>Tooling integration</strong>: how agents connect to APIs, databases, or internal functions (MCPs are popular here).</li>
<li><strong>Safety &amp; Security</strong>: how to ensure data, application and user-level security.</li>
<li><strong>Evaluation &amp; performance</strong>: measuring usefulness, latency, cost, and reliability of the agentic system.</li>
</ul>
<p>Agent Engineers are responsible for answering questions like:</p>
<ul>
<li>How do we serve our agents as an API that our frontend can call?</li>
<li>When should we use REST versus Websockets?</li>
<li>How do we handle request/response timeouts (29 seconds for aws api gateway)?</li>
<li>If tools are exposed via MCP, how should our AI backend establish and maintain a connection to the MCP server? Should it be initialized once using FastAPI lifecycle hooks, or re-established every time an agent runs (probably not)?</li>
<li>How should authentication and authorization be handled — once (probably not), per request, or through persistent sessions?</li>
<li>How do we manage concurrency and state when multiple users call the same agent? Are sessions properly isolated?</li>
<li>What is the security boundary of each request? Are agents only accessing data permitted by RBAC?</li>
<li>How do we log and monitor the agentic system? Tracing is popular, but it’s not enough. How do we capture events like “this request was made,” “this agent, via this request, accessed this data,” and the complete lifecycle of what happened during execution?</li>
</ul>
<p>Agent Engineering is not just about building agents, it's about building the system that runs them (securely). Its 40% agent development, 40% system design and 20% security engineering.</p>
<h2>How Agno helps with Agent Engineering?</h2>
<p><strong>Agno is a multi-agent framework, runtime, and control plane.</strong> It delivers a complete solution for building, deploying and managing multi-agent systems via 3 tightly coupled components:</p>
<ol>
<li><strong>Framework</strong>: for building Agents, Multi-Agent Teams and Workflows.</li>
<li><strong>Pre-built FastAPI Runtime</strong>: for deploying multi-agent systems.</li>
<li><strong>Control Plane</strong>: web interface for managing multi-agent systems.</li>
</ol>
<blockquote>
<p>One frustration I have with most frameworks is that they give you a way to build an agent, but almost no guidance on how to run it in production. Like, how do I serve this as an SSE compatible API that my frontend can call? How do I build a product out of this? This to me, is incomplete, because the real engineering happens after the agent is built. And no, logging (telemetry) and evals is not what makes a system production-grade. Since when did cloudwatch and unit-tests make a product? They're parts of it, sure, but stop selling them as the whole story.</p>
</blockquote>
<p>While Agno gives you an incredibly feature-rich agent framework — it's the pre-built FastAPI application that really sets it apart. We call this the AgentOS. This is the real advantage of Agno, the advantage of working with people who've built these types of systems before.</p>
<span class="text-teal-400">A very simple example: along with the pre-build endpoints, the AgentOS initializes MCP connections in FastAPI lifecycle hooks, and gives you a security-key for authenticating every request.</span>
<p>Next, the control plane — our web interface for managing AgentOS — connects directly to your runtime via your browser, letting you test the real performance of your system. <strong>This architecture honestly only makes sense once you test it.</strong> So give it a try.</p>
<p>It's a novel architecture that makes your setup inherently secure, since your browser connects directly to the runtime, no data is sent to agno, or any external telemetry services or stored outside your cloud, you avoid unnecessary egress and retention costs.</p>
<blockquote>
<p>Sending our AI app data to telemetry services is fundamentally broken. We don't send your app data, user data, or business data to a third-party logger — so why send our AI data? <strong>Why not just connect to the database directly to view it?</strong></p>
</blockquote>
<h2>Minimal Example</h2>
<p>Okay, let's demonstrate the power of Agno with a simple example. Here's a fully working Agent, with conversation history, access to tools via MCP, deployed as a FastAPI app - in 20 lines of code.</p>
<pre class="language-javascript"><code class="language-javascript"><span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">agent</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Agent</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">db</span><span class="token punctuation">.</span><span class="token property-access">sqlite</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">SqliteDb</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">models</span><span class="token punctuation">.</span><span class="token property-access">anthropic</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Claude</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">os</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">AgentOS</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">tools</span><span class="token punctuation">.</span><span class="token property-access">mcp</span> <span class="token keyword module">import</span> <span class="token maybe-class-name">MCPTools</span>

# <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span> <span class="token maybe-class-name">Create</span> <span class="token maybe-class-name">Agent</span> <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span>
agno_agent <span class="token operator">=</span> <span class="token function"><span class="token maybe-class-name">Agent</span></span><span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Agno Agent"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span><span class="token function"><span class="token maybe-class-name">Claude</span></span><span class="token punctuation">(</span>id<span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span><span class="token function"><span class="token maybe-class-name">SqliteDb</span></span><span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"agno.db"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span><span class="token function"><span class="token maybe-class-name">MCPTools</span></span><span class="token punctuation">(</span>url<span class="token operator">=</span><span class="token string">"https://docs.agno.com/mcp"</span><span class="token punctuation">,</span> transport<span class="token operator">=</span><span class="token string">"streamable-http"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    add_history_to_context<span class="token operator">=</span><span class="token maybe-class-name">True</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token maybe-class-name">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

# <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span> <span class="token maybe-class-name">Create</span> <span class="token maybe-class-name">AgentOS</span> <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span>
agent_os <span class="token operator">=</span> <span class="token function"><span class="token maybe-class-name">AgentOS</span></span><span class="token punctuation">(</span>agents<span class="token operator">=</span><span class="token punctuation">[</span>agno_agent<span class="token punctuation">]</span><span class="token punctuation">)</span>
app <span class="token operator">=</span> agent_os<span class="token punctuation">.</span><span class="token method function property-access">get_app</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
</code></pre>
<p>Run your AgentOS using <code>fastapi dev agno_agent.py</code> and chat with it on the <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">AgentOS UI</a>.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-chat.mp4">Your browser does not support the video tag.</video>
<p>Deploy your FastAPI app to your cloud of choice, and voilà, you're live in production. <strong>It's impossible to move this quickly without Agno.</strong></p>
<h2>Summary: The Layers of Agent Engineering</h2>
<p>Agent Engineering has three fundamental layers:</p>
<ol>
<li>The <strong>Framework</strong> (Build)</li>
</ol>
<p>This is where you define your Agents, Teams and Workflows — the schemas, memory, knowledge, and guardrails, the reasoning loop.</p>
<ol start="2">
<li>The <strong>Runtime</strong> (Run)</li>
</ol>
<p>The runtime serves (via API), scales, and orchestrates Agents in production. It handles concurrency, async execution, error recovery, and communication between agents and tools.</p>
<ol start="3">
<li>The <strong>Control Plane</strong> (Manage)</li>
</ol>
<p>The control plane provides visibility: dashboards, monitoring, debugging, and human-in-the-loop control. It's how you understand what your agents are doing — and why.</p>
<p>Agno combines all three. It's not just a framework. It's a <strong>complete runtime and control plane</strong> for multi-agent systems.</p>
<h2>Designed for Agent Engineering</h2>
<p>I'll end this article with a list of features of Agno:</p>
<table><thead><tr><th><strong>Category</strong></th><th><strong>Feature</strong></th><th><strong>Description</strong></th></tr></thead><tbody><tr><td><strong>Core Intelligence</strong></td><td><strong>Model Agnostic</strong></td><td>Works with any model provider so you can use your favorite LLMs.</td></tr><tr><td></td><td><strong>Type Safe</strong></td><td>Enforce structured I/O through <code>input_schema</code> and <code>output_schema</code> for predictable, composable behavior.</td></tr><tr><td></td><td><strong>Dynamic Context Engineering</strong></td><td>Inject variables, state, and retrieved data on the fly into context. Perfect for dependency-driven agents.</td></tr><tr><td><strong>Memory, Knowledge, and Persistence</strong></td><td><strong>Persistent Storage</strong></td><td>Give your Agents, Teams, and Workflows a database to persist session history, state, and messages.</td></tr><tr><td></td><td><strong>User Memory</strong></td><td>Built-in memory system that allows Agents to recall user-specific context across sessions.</td></tr><tr><td></td><td><strong>Agentic RAG</strong></td><td>Connect to 20+ vector stores (called <strong>Knowledge</strong> in Agno) with hybrid search + reranking out of the box.</td></tr><tr><td></td><td><strong>Culture (Collective Memory)</strong></td><td>Shared knowledge that compounds across agents and time.</td></tr><tr><td><strong>Execution &amp; Control</strong></td><td><strong>Human-in-the-Loop</strong></td><td>Native support for confirmations, manual overrides, and external tool execution.</td></tr><tr><td></td><td><strong>Guardrails</strong></td><td>Built-in safeguards for validation, security, and prompt protection.</td></tr><tr><td></td><td><strong>Agent Lifecycle Hooks</strong></td><td>Pre- and post-hooks to validate or transform inputs and outputs.</td></tr><tr><td></td><td><strong>MCP Integration</strong></td><td>First-class support for the Model Context Protocol (MCP) to connect Agents with external systems.</td></tr><tr><td></td><td><strong>Toolkits</strong></td><td>100+ built-in toolkits with thousands of tools, ready for use across data, code, web, and enterprise APIs.</td></tr><tr><td><strong>Runtime &amp; Evaluation</strong></td><td><strong>Runtime</strong></td><td>Pre-built FastAPI based runtime with SSE compatible endpoints, ready for production on day 1.</td></tr><tr><td></td><td><strong>Control Plane (UI)</strong></td><td>Integrated interface to visualize, monitor, and debug agent activity in real time.</td></tr><tr><td></td><td><strong>Natively Multimodal</strong></td><td>Agents can process and generate text, images, audio, video, and files.</td></tr><tr><td></td><td><strong>Evals</strong></td><td>Measure your Agents' Accuracy, Performance, and Reliability.</td></tr><tr><td><strong>Security &amp; Privacy</strong></td><td><strong>Private by Design</strong></td><td>Runs entirely in your cloud. The UI connects directly to your AgentOS from your browser, no data is ever sent externally.</td></tr><tr><td></td><td><strong>Data Governance</strong></td><td>Your data lives securely in your Agent database, no external data sharing or vendor lock-in.</td></tr><tr><td></td><td><strong>Access Control</strong></td><td>Role-based access (RBAC) and per-agent permissions to protect sensitive contexts and tools.</td></tr></tbody></table>
<hr>
<h2>Want to build with Agno?</h2>
<ul>
<li>
<p><strong>Agno documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></p>
</li>
<li>
<p><strong>Signup for the AgentOS:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></p>
</li>
<li>
<p><strong>Star Agno on Github:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></p>
</li>
</ul>
<hr>
<p>Read more on <a target="_blank" rel="noopener noreferrer" class="" href="https://www.agno.com">agno.com</a></p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Agentic Culture]]></title>
            <link>https://ashpreetbedi.com/agentic-culture</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/agentic-culture</guid>
            <pubDate>Tue, 21 Oct 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Andrej Karpathy shared on the <a target="_blank" rel="noopener noreferrer" class="" href="https://youtu.be/lXUZvyajciY?si=i1Q-eOCBUEWmMGh7&amp;t=6034">Dwarkesh Podcast</a> that LLMs don't have the equivalent of "culture".</p>
<p>So we built the scaffolding for them to develop one.</p>
<h2>Why Culture?</h2>
<p>Every Agent learns from its own interactions — the tasks it runs, the conversations it has, the errors it fixes. But that knowledge is siloed. It disappears when the session ends or the user changes.</p>
<p>Humans solved this problem a long time ago. We call it <strong>culture</strong> — the consolidation of shared knowledge that compounds over time.</p>
<p>With Agno, you can now give your Agents the same ability to <strong>learn collectively</strong>.</p>
<hr>
<h2>Introducing Agentic Culture</h2>
<p><strong>Agentic Culture</strong> is an open-source experiment in <strong>collective memory</strong> and <strong>in-context cultural</strong> for multi-agent systems.</p>
<p>It provides a shared cultural database where Agents can store and retrieve knowledge that persists beyond individual sessions, users, or memories. Culture becomes a living, evolving layer of context that shapes Agent reasoning and behavior over time.</p>
<p>Agents can now create, read, explore, and learn from their collective experience. See the <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/agentic-culture">Agentic Culture</a> cookbook for example code.</p>
<blockquote>
<p>“Culture is how intelligence compounds”</p>
</blockquote>
<hr>
<h2>How It Works</h2>
<p>Culture acts as a shared database where Agents can save reusable knowledge that benefits all interactions.</p>
<p>While <strong>Memory</strong> captures user-specific details (e.g. "Sarah prefers email"), <strong>Culture</strong> captures universal principles that benefit all interactions (e.g. "Always provide actionable next steps").</p>
<p>You can use Agno’s <code>CultureManager</code> to create and manage cultural knowledge entries. These are stored in your chosen database and automatically retrieved by your Agents for contextual grounding.</p>
<pre class="language-python"><code class="language-python"><span class="token triple-quoted-string string">"""Demonstrates how to create and persist shared cultural knowledge with Agno's `CultureManager`."""</span>

<span class="token keyword">from</span> agno<span class="token punctuation">.</span>culture<span class="token punctuation">.</span>manager <span class="token keyword">import</span> CultureManager
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>anthropic <span class="token keyword">import</span> Claude
<span class="token keyword">from</span> rich<span class="token punctuation">.</span>pretty <span class="token keyword">import</span> pprint

<span class="token comment"># Step 1. Initialize the database</span>
db <span class="token operator">=</span> SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/demo.db"</span><span class="token punctuation">)</span>

<span class="token comment"># Step 2. Create the Culture Manager</span>
culture_manager <span class="token operator">=</span> CultureManager<span class="token punctuation">(</span>
    db<span class="token operator">=</span>db<span class="token punctuation">,</span>
    model<span class="token operator">=</span>Claude<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

<span class="token comment"># Step 3. Create cultural knowledge from a message</span>
message <span class="token operator">=</span> <span class="token punctuation">(</span>
    <span class="token string">"All technical guidance should follow the 'Operational Thinking' principle:\n"</span>
    <span class="token string">"1. **State the Objective** — What outcome are we trying to achieve and why.\n"</span>
    <span class="token string">"2. **Show the Procedure** — List clear, reproducible steps (commands/configs).\n"</span>
    <span class="token string">"3. **Surface Pitfalls** — What usually fails and how to detect it early.\n"</span>
    <span class="token string">"4. **Define Validation** — How to confirm it’s working (logs, tests, metrics).\n"</span>
    <span class="token string">"5. **Close the Loop** — Suggest next iterations or improvements."</span>
<span class="token punctuation">)</span>

culture_manager<span class="token punctuation">.</span>create_cultural_knowledge<span class="token punctuation">(</span>message<span class="token operator">=</span>message<span class="token punctuation">)</span>

<span class="token comment"># Step 4. Retrieve and inspect stored knowledge</span>
pprint<span class="token punctuation">(</span>culture_manager<span class="token punctuation">.</span>get_all_knowledge<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
</code></pre>
<p>Now give your agents access to the shared culture by setting <code>add_culture_to_context=True</code>. That's it. Your Agents now learn from shared cultural knowledge.</p>
<pre class="language-python"><code class="language-python"><span class="token triple-quoted-string string">"""Use cultural knowledge with your Agents."""</span>

<span class="token keyword">from</span> agno<span class="token punctuation">.</span>agent <span class="token keyword">import</span> Agent
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>db<span class="token punctuation">.</span>sqlite <span class="token keyword">import</span> SqliteDb
<span class="token keyword">from</span> agno<span class="token punctuation">.</span>models<span class="token punctuation">.</span>anthropic <span class="token keyword">import</span> Claude

db <span class="token operator">=</span> SqliteDb<span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"tmp/demo.db"</span><span class="token punctuation">)</span>

agent <span class="token operator">=</span> Agent<span class="token punctuation">(</span>
    model<span class="token operator">=</span>Claude<span class="token punctuation">(</span><span class="token builtin">id</span><span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span>db<span class="token punctuation">,</span>
    add_culture_to_context<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    <span class="token comment"># optional: run culture manager after each run</span>
    <span class="token comment"># update_cultural_knowledge=True,</span>
<span class="token punctuation">)</span>

agent<span class="token punctuation">.</span>print_response<span class="token punctuation">(</span>
    <span class="token string">"How do I set up a FastAPI service using Docker?"</span><span class="token punctuation">,</span>
    stream<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</code></pre>
<hr>
<h2>What You Can Do With It</h2>
<p>The current <strong>v0.1</strong> release focuses on helping Agents stay consistent in tone, reasoning, and behavior. Over time, the goal is to transform isolated Agents into a living, evolving system of intelligence.</p>
<p>With Culture, you can:</p>
<ul>
<li>Accumulate learnings and behavioral patterns from successful runs</li>
<li>Use that collective context to guide future decisions</li>
<li>Observe how "culture" evolves across teams, orgs, and domains</li>
</ul>
<hr>
<h2>Examples</h2>
<p>The <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/agentic-culture">Agentic Culture</a> cookbook includes several runnable recipes:</p>
<table><thead><tr><th>File</th><th>Description</th></tr></thead><tbody><tr><td>01_create_cultural_knowledge.py</td><td>Create cultural knowledge using a model.</td></tr><tr><td>02_use_cultural_knowledge_in_agent.py</td><td>Use cultural knowledge inside Agents.</td></tr><tr><td>03_automatic_cultural_management.py</td><td>Let Agents autonomously update culture over time.</td></tr><tr><td>04_manually_add_culture.py</td><td>Manually seed culture for tone guides or org-wide principles.</td></tr><tr><td>05_test_agent_with_cultural_knowledge.py</td><td>Freestyle testing — see culture in action.</td></tr></tbody></table>
<p>Each builds on the previous one, so you can run them in sequence.</p>
<p>Agno is open-source, so you can contribute to the cookbook or build your own recipes. Here's the github repository: <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></p>
<hr>
<h2>Future Work</h2>
<p>This is early, but promising. We're exploring how to:</p>
<ul>
<li>Integrate culture across multi-agent teams.</li>
<li>Sync or version cultural knowledge programmatically</li>
<li>Store culture in Postgres, Redis, or your own backend</li>
<li>Let Agents evolve shared norms collectively, like emergent civilizations</li>
</ul>
<p>Karpathy describes a future where LLMs have a "giant scratchpad" — a shared space to think, write, and build on each other's ideas.</p>
<p>Agno is providing the scaffolding for developing that culture.</p>
<hr>
<h2>Explore &amp; Build</h2>
<ul>
<li><strong>Explore Agentic Culture:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/agentic-culture">agno.link/agentic-culture</a></li>
<li><strong>Agno on GitHub:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></li>
<li><strong>Documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></li>
<li><strong>Agno Website:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://www.agno.com">agno.com</a></li>
</ul>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Becoming AI-first]]></title>
            <link>https://ashpreetbedi.com/becoming-ai-first</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/becoming-ai-first</guid>
            <pubDate>Sun, 19 Oct 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<span class="text-2xl font-semibold"><p>✨ Lessons from 100s of conversations on AI products and how teams are adopting AI.</p></span>
<p>Every tuesday and thursday, I take 3–5 calls with builders, CTOs, and CEOs of companies. One question on every CEO's mind is:</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-3 text-lg"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed">"How do I make my company AI-first?"</div></div></div></blockquote>
<p>Common variations include: how can we use AI better? should we be building Agents? how do we add AI to our products?</p>
<p>Over time, I've identified patterns in how leading companies are approaching this question, and what separates the ones making real progress from those still in exploration mode.</p>
<h2>What "AI-first" really means</h2>
<p>Being AI-first doesn't mean using AI everywhere, or re-architecting you entire company or product around ChatGPT.</p>
<p>It means understanding where intelligence creates leverage for your team, your operations, and your product. If you can identify where AI genuinely moves the needle, you're already halfway there.</p>
<p>Broadly, I've found three high-leverage entry points:</p>
<ol>
<li><strong>Internal tools</strong> that improve productivity and decision-making.</li>
<li><strong>Workflow automation</strong> that saves time and reduces operational load.</li>
<li><strong>User-facing products</strong> that create revenue and differentiation.</li>
</ol>
<p>Each represents a layer in your company's AI maturity. Let's dig in.</p>
<span class="text-xl font-semibold"><p><strong>1. Internal Tools</strong></p></span>
<p>These tools help your team save time, become more productive, and build intuition around AI. <strong>General-purpose agents</strong> (ChatGPT, Claude), <strong>coding assistants</strong> (Cursor, Claude Code), or <strong>vertical agents</strong> (legal, sales, marketing) all fit here. I'm yet to meet a team that's not all-in here.</p>
<p>These don't require a polished UX or commercial rollout — just curiosity and experimentation. The payoff is your team becoming AI-native faster than your competitors.</p>
<blockquote>
<p><strong>Most teams I speak to give everyone access to a multitude of AI tools. The cost is trivial compared to the learning dividend.</strong></p>
</blockquote>
<p>If you're not doing this already, get your team a ChatGPT subscription and cursor/CC for coding. Connect these tools to your company knowledge, databases, and documents. Let your team explore, learn, and build intuition.</p>
<span class="text-xl font-semibold"><p><strong>2. Workflow Automation</strong></p></span>
<p>Once your team sees what's possible, you'll start spotting repeatable patterns ripe for automation. This is where AI turns mundane tasks into automated processes that can run in the background.</p>
<p>Examples: invoice classification, market research, sales prep, support summarization, or daily reporting.</p>
<p>That said, the highest-ROI workflows are almost always specific to your team. They take effort to design — and while "no-code" tools like N8N or Zapier can help, most serious setups eventually involve code. Frameworks like Agno can help here if you have engineering resources.</p>
<blockquote>
<p><strong>Treat automation as part of your system design, not a side project. Its ok to invest in it, if only to learn and build intuition.</strong></p>
</blockquote>
<span class="text-xl font-semibold"><p><strong>3. User-Facing AI Products</strong></p></span>
<p>This is where AI creates compounding value — by improving the product your users already love. You can:</p>
<ol>
<li>Buy off-the-shelf products that add AI-powered features to existing products (e.g., a support agent). I highly recommend this as a starting point, its easy to get started and you start seeing immediate value.</li>
<li>Build new AI features specific to your product. The goal here is to make your product smarter, faster, and more delightful.</li>
</ol>
<p>Your goal here isn't to "add AI" — it's to make the experience better. The best AI features often don't look like AI at all.</p>
<blockquote>
<p><strong>Our most successful case studies are ones where users don't even realize AI is at work, they just notice things getting smarter, forms getting filled automatically, and buttons that automate what was previously a 10-step manual process.</strong></p>
</blockquote>
<p>So general recommendation is to start with off-the-shelf products that add AI-powered features to your product. But once you need to build AI-features that are specific to your product, here's how to do it.</p>
<ol>
<li>
<p>Add small, reliable AI features - ideally as "magic buttons" or "magic interactions". Reliability is the keyword here.</p>
</li>
<li>
<p>Automate targeted, well-defined problems - solve one painful step at a time. Serve the AI application as a RestAPI, which your product can call when the user clicks the "magic button".</p>
</li>
<li>
<p>Avoid generic chatbots - they shift the cognitive load to the user and expose an incredibly vast surface area, which is bound to disappoint. Instead, build clear, purposeful interfaces that do the work for them. This will also force you to think about the user experience and how to make it more intuitive and delightful.</p>
</li>
</ol>
<p>Each of these "magic moments" compounds. Over time, your product becomes AI-first not by branding, but by behavior.</p>
<blockquote class="not-prose relative isolate pl-6 text-ink py-2 text-base"><span aria-hidden="true" class="absolute inset-y-1 left-0 w-0.5 rounded-full bg-accent"></span><div class="flex gap-3"><div class="space-y-1"><div class="leading-relaxed"><p><strong>Start simple, focus on clarity and reliability over complexity</strong>.</p></div></div></div></blockquote>
<h2>From exploration to execution</h2>
<p>If you want to accelerate this journey, <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.com">Agno</a> is a starting point.</p>
<p>It will give you the right primitives for building AI features and a FastAPI application that you can deploy in your cloud (for privacy and security). Your product can easily integrate with this API and before you know it, you'll be serving AI features to your users.</p>
<hr>
<h2>Want to build with Agno?</h2>
<ul>
<li>
<p><strong>Agno documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></p>
</li>
<li>
<p><strong>Signup for the AgentOS:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></p>
</li>
<li>
<p><strong>Star Agno on Github:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></p>
</li>
</ul>
<hr>
<p>Read more on <a target="_blank" rel="noopener noreferrer" class="" href="https://www.agno.com">agno.com</a></p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
        <item>
            <title><![CDATA[Introducing Agno]]></title>
            <link>https://ashpreetbedi.com/introducing-agno</link>
            <guid isPermaLink="false">https://ashpreetbedi.com/introducing-agno</guid>
            <pubDate>Wed, 15 Oct 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<span class="text-2xl font-semibold"><p>✨ The Multi-Agent Framework, Runtime, and UI.</p></span>
<img alt="Agno AgentOS" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fintro_agno_agentos.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fintro_agno_agentos.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fintro_agno_agentos.png&amp;w=1920&amp;q=75">
<p>Over the past 3 years, I've been obsessed with building the perfect harness for multi-agent systems. A mission to deliver the best system for building, deploying and scaling agentic software.</p>
<p>Today, Agno is used by thousands of builders at the largest companies in the world, including 3 of the fortune 5. Let's dive in.</p>
<h2>What is Agno?</h2>
<p><strong>Agno is a multi-agent framework, runtime, and UI.</strong> It takes a systems engineering approach to agent development by delivering 3 tightly coupled components:</p>
<ol>
<li><strong>Framework</strong>: for building multi-agent systems.</li>
<li><strong>Runtime</strong>: for deploying multi-agent systems.</li>
<li><strong>UI</strong>: for managing multi-agent systems.</li>
</ol>
<p>These 3 components form the harness for the perfect agentic system.</p>
<p>Can you build these yourself? Absolutely. But <strong>Agno gives you speed, speed gives you momentum, and momentum is everything.</strong></p>
<blockquote>
<p>Enough talk, let's see some code.</p>
</blockquote>
<p>Here's a fully working Agent, with conversation history, access to tools via MCP, deployed as a FastAPI app - in 20 lines of code.</p>
<pre class="language-javascript"><code class="language-javascript"><span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">agent</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Agent</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">db</span><span class="token punctuation">.</span><span class="token property-access">sqlite</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">SqliteDb</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">models</span><span class="token punctuation">.</span><span class="token property-access">anthropic</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">Claude</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">os</span> <span class="token keyword module">import</span> <span class="token imports"><span class="token maybe-class-name">AgentOS</span></span>
<span class="token keyword module">from</span> agno<span class="token punctuation">.</span><span class="token property-access">tools</span><span class="token punctuation">.</span><span class="token property-access">mcp</span> <span class="token keyword module">import</span> <span class="token maybe-class-name">MCPTools</span>

# <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span> <span class="token maybe-class-name">Create</span> <span class="token maybe-class-name">Agent</span> <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span>
agno_agent <span class="token operator">=</span> <span class="token function"><span class="token maybe-class-name">Agent</span></span><span class="token punctuation">(</span>
    name<span class="token operator">=</span><span class="token string">"Agno Agent"</span><span class="token punctuation">,</span>
    model<span class="token operator">=</span><span class="token function"><span class="token maybe-class-name">Claude</span></span><span class="token punctuation">(</span>id<span class="token operator">=</span><span class="token string">"claude-sonnet-4-5"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    db<span class="token operator">=</span><span class="token function"><span class="token maybe-class-name">SqliteDb</span></span><span class="token punctuation">(</span>db_file<span class="token operator">=</span><span class="token string">"agno.db"</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    tools<span class="token operator">=</span><span class="token punctuation">[</span><span class="token function"><span class="token maybe-class-name">MCPTools</span></span><span class="token punctuation">(</span>url<span class="token operator">=</span><span class="token string">"https://docs.agno.com/mcp"</span><span class="token punctuation">,</span> transport<span class="token operator">=</span><span class="token string">"streamable-http"</span><span class="token punctuation">)</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    add_history_to_context<span class="token operator">=</span><span class="token maybe-class-name">True</span><span class="token punctuation">,</span>
    markdown<span class="token operator">=</span><span class="token maybe-class-name">True</span><span class="token punctuation">,</span>
<span class="token punctuation">)</span>

# <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span> <span class="token maybe-class-name">Create</span> <span class="token maybe-class-name">AgentOS</span> <span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">**</span><span class="token operator">*</span>
agent_os <span class="token operator">=</span> <span class="token function"><span class="token maybe-class-name">AgentOS</span></span><span class="token punctuation">(</span>agents<span class="token operator">=</span><span class="token punctuation">[</span>agno_agent<span class="token punctuation">]</span><span class="token punctuation">)</span>
app <span class="token operator">=</span> agent_os<span class="token punctuation">.</span><span class="token method function property-access">get_app</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
</code></pre>
<p>Run your AgentOS using <code>fastapi dev agno_agent.py</code> and chat with it on the <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">AgentOS UI</a>.</p>
<video width="700" height="700" class="rounded-2xl" loop="" autoplay="" muted="" playsinline="" controls=""><source src="/videos/agentos-chat.mp4">Your browser does not support the video tag.</video>
<p>Deploy your FastAPI app to your cloud of choice, and voilà, you're live in production. <strong>It's impossible to move this quickly without Agno.</strong></p>
<h2>✨ Part I: The Framework</h2>
<blockquote>
<p>Agent Engineering is an exercise in iteration. You can't iterate if you don't have a v0.1. A batteries included setup gets your agent in the hands of your internal team. Then you can edit in a loop.</p>
</blockquote>
<div class="flex w-full justify-end text-xs"><a target="_blank" rel="noopener noreferrer" class="" href="https://www.vtrivedy.com/posts/claude-code-sdk-haas-harness-as-a-service/"><p>[stolen from vtridvedy]</p></a></div>
<p>Agno delivers a full-featured, performance-optimized agent framework with every primitive you can think of. <strong>Session storage</strong>, <strong>memory</strong>, <strong>knowledge (RAG)</strong>, <strong>context management</strong>, <strong>tools</strong> (pre-built and MCP), <strong>guardrails</strong>, <strong>dependency injection</strong>, <strong>human in the loop</strong>, and more. Every part of agent execution is customizable via pre-hooks, post-hooks, and state management, so you're never boxed into default behavior.</p>
<p>Agents are completely type-safe, you can use them as chatbots (string input, string output) or with structured inputs and outputs. Not only that, Agents can use separate parser-models to generate structured outputs, so reasoning is not compromised (only available on Agno).</p>
<span class="text-xl font-semibold justify-center flex"><p>✨ The Multi-Agent Paradox</p></span>
<p>The big debate in multi-agent systems is whether agents should execute other sub-agents (handoff-approach), or the developer should programmatically define the flow of execution (workflow-approach).</p>
<blockquote>
<p>The answer: why not both?</p>
</blockquote>
<p>With Agno, <strong>Agents can be executed by themselves, as part of a multi-agent Team (autonomous execution) or a step-based Workflow (controlled execution)</strong>. Your use-case determines your approach.</p>
<p>Agent Teams have a shared state, agentic context management (i.e. the team leader manages the context for the team), shared memory and knowledge. Teams can also execute other teams, or workflows.</p>
<p>Workflows are deterministic, where each step can be an agent, team, workflow, or a plain old python function. Steps can be parallelized, branched, run via conditional logic or loops.</p>
<p>There's so much more I can cover here, but i'll save that for the <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">docs</a>. The gist is, when building Agents, my goal is to get to v0.1 within a few hours and iterate from there. Agno gives me that.</p>
<blockquote>
<p>New agent engineers think that building the solution is the hard part - NO. Finding the right use-case is the hard part. To do that, you need to tackle 3, 5, or 10 different problems. Agno gets you to use-case #10, which is where the magic happens.</p>
</blockquote>
<h2>✨ Part II: The Runtime</h2>
<p>Seasoned builders know that to build successful agentic products, you need to iterate on multiple variations before you hit gold. Also:</p>
<ol>
<li>You're not going to build by yourself, you need to get it in the hands of your team quickly (especially the non-technical folks).</li>
<li>You need some sort of system to test, serve and integrate with your product as quickly as possible (to get user feedback).</li>
</ol>
<p>This means you need to build an API to serve your agents, your product will integrate with this API via REST or WebSockets. You also need a UI to test, monitor, debug and manage your system.</p>
<span class="text-xl font-semibold justify-center flex"><p>✨ You need an AI backend.</p></span>
<p>This is where the AgentOS comes in. In the simplest terms, it's a FastAPI application with pre-built endpoints for serving your agents, teams and workflows. You can also manage knowledge bases, user memories, agent sessions, and evaluate your system in real-time.</p>
<p><strong>The AgentOS is a high-performance runtime for multi-agent systems. It gives you a ready-to-use FastAPI app for deploying your agents, and an integrated UI for testing, monitoring and managing them.</strong></p>
<p>Deploy your AgentOS to your cloud of choice. Session data, knowledge, memories, all live in your database. No data ever leaves your system.</p>
<blockquote>
<p>In my experience, once you have a semblence of an Agent you like, you need to get it in the hands of your team and early users quickly. The pre-built api endpoints give you such an incredible headstart that its almost a no-brainer to use.</p>
</blockquote>
<p>Here are the pre-built api endpoints, ready to use:</p>
<img alt="Agno AgentOS API" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fintro_agno_agentos_api.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fintro_agno_agentos_api.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fintro_agno_agentos_api.png&amp;w=1920&amp;q=75">
<h2>✨ Part III: The Control Plane</h2>
<p>Wait, there's more?</p>
<p>The AgentOS comes with a web interface that connects directly to the AgentOS runtime (using the pre-built api endpoints). It's an novel architecture, where the web app (running in your browser) connects directly to the AgentOS runtime. You can test (chat and run) your agents, teams and workflows, manage knowledge bases, user memories, and evaluate your system in real-time. Here's how it looks:</p>
<img alt="Agno AgentOS UI" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fintro_agno_agentos_ui.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fintro_agno_agentos_ui.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fintro_agno_agentos_ui.png&amp;w=1920&amp;q=75">
<p>If you're using a tracing service, this will change how you look at things. You're not sending any data out, you're not paying for retention costs, and
you're not worrying about data privacy. The app pulls in sessions directly from the Agent's database and show's them:</p>
<img alt="Agno AgentOS UI Sessions" loading="lazy" width="700" height="700" decoding="async" data-nimg="1" class="rounded-2xl" style="color:transparent" srcset="/_next/image?url=%2Fimages%2Fintro_agno_agentos_sessions.png&amp;w=750&amp;q=75 1x, /_next/image?url=%2Fimages%2Fintro_agno_agentos_sessions.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2Fimages%2Fintro_agno_agentos_sessions.png&amp;w=1920&amp;q=75">
<p>The traces and runtime data is stored in your database, and the AgentOS UI connects from your browser to the AgentOS runtime.</p>
<p>Its a novel architecture designed to give you complete data ownership:</p>
<ul>
<li><strong>Your Infrastructure, Your Data</strong>: Your AgentOS runs in your cloud.</li>
<li><strong>Zero Data Transmission</strong>: No conversations, logs, or metrics are sent to external services. They belong to you.</li>
<li><strong>Private by Default</strong>: All processing, storage, and analytics happen in your environment.</li>
</ul>
<p>Personally, I'm surprised we collectively agreed to hand over every user interaction to tracing companies. Just the retention issues are enough to make you think twice, let alone the data privacy concerns.</p>
<span class="text-xl font-semibold flex justify-center"><p>For companies building agents, Agno delivers the complete solution.</p></span>
<p>Unless you're an infra or devtools company, you're focused on solving user problems. Agno free's up your mental capacity so you can <span class="text-orange-500">a)</span> find the right problem to tackle, <span class="text-orange-500">b)</span> build your MVP quickly, and <span class="text-orange-500">c)</span> iterate and improve your product.</p>
<p>Thousands of builders choose Agno, thank you for letting us be a part of your journey ✨</p>
<hr>
<h2>Want to build with Agno?</h2>
<ul>
<li>
<p><strong>Agno documentation:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/docs">agno.link/docs</a></p>
</li>
<li>
<p><strong>Signup for the AgentOS:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://os.agno.com">os.agno.com</a></p>
</li>
<li>
<p><strong>Star Agno on Github:</strong> <a target="_blank" rel="noopener noreferrer" class="" href="https://agno.link/gh">agno.link/gh</a></p>
</li>
</ul>
<hr>
<p>Read more on <a target="_blank" rel="noopener noreferrer" class="" href="https://www.agno.com">agno.com</a></p>]]></content:encoded>
            <author>hi@ashpreetbedi.com (Ashpreet Bedi)</author>
        </item>
    </channel>
</rss>