GPU Poor Continuous Learning with Gemini 3

In this post, I'll share a pattern I've been using to make my agents noticeably more reliable over time. No fine-tuning. No retraining. No GPUs — just better system design.

Table of Contents

  1. The problem with disconnected sessions
  2. What is "gpu-poor continuous learning"
  3. Why Gemini 3 Flash
  4. The learning loop
  5. Demo
  6. What we store (and what we don't)
  7. How to run your own Self-Learning Agent
  8. Why this pattern works

1. The problem with disconnected sessions

Most agents run in independent sessions, disconnected from each other.

You ask a question. You get an answer. Tomorrow you ask a similar question and the agent starts from scratch. It doesn't remember what worked, what failed, or what it figured out along the way.

This is fine for simple tasks. But for anything complex—research, analysis, decision support—it means:

  • Repeating the same reasoning patterns
  • Re-discovering the same gotchas
  • Never building on past success

If your agent can't learn from its own experience, you're leaving performance on the table.

2. What is GPU Poor Continuous Learning

Let me be precise about terminology, because "continuous learning" has a specific meaning in ML.

Traditional continuous learning:

  • Model weights update over time
  • Requires compute (GPUs, TPUs)
  • Risk of catastrophic forgetting
  • Learning happens in parameters

What I'm doing (GPU Poor Continuous Learning):

  • Model stays completely frozen
  • Zero training compute
  • Learning happens in retrieval
  • Knowledge is auditable and reversible

The model doesn't get smarter. The system gets smarter.

I call it "GPU Poor" because you get continuous improvement without any of the infrastructure traditionally required for model updates. It's poor man's continuous learning—and it works surprisingly well.

3. Why Gemini 3 Flash

I built this with Gemini 3 Flash, which launched today. Here's why:

FactorGemini 3 Flash
Cost$0.50/1M input, $3/1M output
Speed3x faster than 2.5 Pro
Context1M tokens input
Agentic coding78% SWE-bench (beats Gemini 3 Pro)
Context caching90% cost reduction for repeated tokens

For a self-learning agent, you want:

  1. Low cost — You're making many calls per session
  2. Fast inference — Tight feedback loops matter
  3. Large context — Prior learnings need room alongside new data
  4. Strong tool use — The agent needs to reliably call save/retrieve functions

Gemini 3 Flash hits all four. The 1M context window is especially useful—you can include substantial prior learnings without truncating.

4. The learning loop

Here's the core pattern:

                         Query
                           │
                           ▼
                   Search learnings
                           │
                           ▼
                       Research
                           │
                           ▼
                      Synthesize
                           │
                           ▼
                        Reflect
                           │
              ┌────── reusable? ──────┐
              │                       │
             Yes                      No
              │                       │
              ▼                       │
        Propose to user               │
              │                       │
       ┌── approved? ──┐              │
       │               │              │
      Yes              No             │
       │               │              │
       ▼               │              │
     Save              │              │
       │               │              │
       └───────────────┴──────────────┘
                       │
                       ▼
                    Answer

Key details:

  1. Search first — The agent must explicitly search the knowledge base before doing anything else. This isn't automatic; it's enforced through instructions.

  2. Most queries won't produce a learning — This is expected. Learnings should be rare and high-signal, not routine.

  3. Human-in-the-loop gating — The agent proposes learnings, but only saves them with explicit approval. If the user declines, the agent moves on without re-proposing.

5. Demo

Here's a demo of the agent in action.

6. What we store (and what we don't)

The biggest mistake is storing too much.

A learning is worth saving if it is:

  • Specific: "When comparing ETFs, check expense ratio AND tracking error" not "Look at ETF metrics"
  • Actionable: Can be directly applied in future similar queries
  • Generalizable: Useful beyond this specific question

Do not save: raw facts, one-off answers, summaries, speculation, or anything unlikely to recur.

Each learning is structured:

{
    "title": "ETF comparison checklist",
    "context": "When comparing similar ETFs for investment decisions",
    "learning": "Always check both expense ratio AND tracking error. Low expense ratio with high tracking error can cost more than a slightly more expensive fund with tight tracking.",
    "confidence": "high",
    "type": "heuristic",
    "created_at": "2025-12-17T10:30:00Z"
}

Most tasks will not produce a learning. That's expected.

7. How to run your own Self-Learning Agent

I'm providing cookbooks for running your own self-learning agent, built using:

  • FastAPI application for running the agent
  • Postgres database for storing sessions, memory, and knowledge

Here's the link to the code.

You can wrap this up in a container and deploy it to Railway. Here's a sample repository you can use.

Steps to run your own Self-Learning Agent

1. Clone the repo

git clone https://github.com/agno-agi/agno.git
cd agno

2. Create and activate a virtual environment

uv venv .gemini-agents --python 3.12
source .gemini-agents/bin/activate

3. Install dependencies

uv pip install -r cookbook/02_examples/04_gemini/requirements.txt

4. Set environment variables

# Required for Gemini models
export GOOGLE_API_KEY=your-google-api-key

# Required for agents using parallel search
export PARALLEL_API_KEY=your-parallel-api-key

5. Run Postgres with PgVector

Postgres stores agent sessions, memory, knowledge, and state. Install Docker Desktop and run:

./cookbook/scripts/run_pgvector.sh

Or run directly:

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql \
  -v pgvolume:/var/lib/postgresql \
  -p 5532:5432 \
  --name pgvector \
  agnohq/pgvector:18

6. Run the Agent OS

Agno provides a web interface for interacting with agents. Start the server:

python cookbook/02_examples/04_gemini/run.py

Then visit os.agno.com and add http://localhost:7777 as an endpoint.

8. Why this pattern works

This approach works because it separates concerns that are usually conflated:

ConcernTraditionalGPU Poor
ReasoningModelModel (unchanged)
LearningModel weightsKnowledge base
MemoryContext windowPersistent storage

Benefits:

  • Auditable — You can see exactly what the agent "learned"
  • Reversible — Delete a bad learning, system improves
  • Fast feedback — No training cycles, immediate improvement
  • No forgetting — New learnings don't overwrite capabilities

The pattern generalizes beyond research. Use it for:

  • Market analysis
  • Competitive intelligence
  • Technical support
  • Decision logging
  • Policy tracking

Anywhere beliefs evolve, learnings beat stateless answers.


Thank you for reading! Feel free to reach out on X if you have questions or feedback.

GPU Poor Continuous Learning with Gemini 3