GPU Poor Continuous Learning with Gemini 3

Here's a pattern I've been using to make my agents better without fine-tuning or retraining. We'll use a simple system-level learning loop that's surprisingly effective.

The problem with disconnected sessions
What is "gpu-poor continuous learning"
Why Gemini 3 Flash
The learning loop
Demo
What we store (and what we don't)
How to run your own Self-Learning Agent
Why this pattern works

1. The problem with disconnected sessions

Most agents run in independent sessions, disconnected from each other.

You ask a question. You get an answer. Tomorrow you ask a similar question and the agent starts from scratch. It doesn't remember what worked, what failed, or what it figured out along the way.

This is fine for simple tasks. But for anything complex—research, analysis, decision support—it means:

Repeating the same reasoning patterns
Re-discovering the same gotchas
Never building on past success

If your agent can't learn from its own experience, you're leaving performance on the table.

2. What is GPU Poor Continuous Learning

Let me be precise about terminology, because "continuous learning" has a specific meaning in ML.

Traditional continuous learning:

Model weights update over time
Requires compute (GPUs, TPUs)
Risk of catastrophic forgetting
Learning happens in parameters

What I'm doing (GPU Poor Continuous Learning):

Model stays completely frozen
Zero training compute
Learning happens in retrieval
Knowledge is auditable and reversible

The model doesn't get smarter. The system gets smarter.

I call it "GPU Poor" because you get continuous improvement without any of the infrastructure traditionally required for model updates. It's poor man's continuous learning—and it works surprisingly well.

3. Why Gemini 3 Flash

I built this with Gemini 3 Flash, which launched today. Here's why:

Factor	Gemini 3 Flash
Cost	$0.50/1M input, $3/1M output
Speed	3x faster than 2.5 Pro
Context	1M tokens input
Agentic coding	78% SWE-bench (beats Gemini 3 Pro)
Context caching	90% cost reduction for repeated tokens

For a self-learning agent, you want:

Low cost — You're making many calls per session
Fast inference — Tight feedback loops matter
Large context — Prior learnings need room alongside new data
Strong tool use — The agent needs to reliably call save/retrieve functions

Gemini 3 Flash hits all four. The 1M context window is especially useful—you can include substantial prior learnings without truncating.

4. The learning loop

Here's the core pattern:

                         Query
                           │
                           ▼
                   Search learnings
                           │
                           ▼
                       Research
                           │
                           ▼
                      Synthesize
                           │
                           ▼
                        Reflect
                           │
              ┌────── reusable? ──────┐
              │                       │
             Yes                      No
              │                       │
              ▼                       │
        Propose to user               │
              │                       │
       ┌── approved? ──┐              │
       │               │              │
      Yes              No             │
       │               │              │
       ▼               │              │
     Save              │              │
       │               │              │
       └───────────────┴──────────────┘
                       │
                       ▼
                    Answer

Key details:

Search first — The agent must explicitly search the knowledge base before doing anything else. This isn't automatic; it's enforced through instructions.
Most queries won't produce a learning — This is expected. Learnings should be rare and high-signal, not routine.
Human-in-the-loop gating — The agent proposes learnings, but only saves them with explicit approval. If the user declines, the agent moves on without re-proposing.

5. Demo

Here's a demo of the agent in action.

6. What we store (and what we don't)

The biggest mistake is storing too much.

A learning is worth saving if it is:

Specific: "When comparing ETFs, check expense ratio AND tracking error" not "Look at ETF metrics"
Actionable: Can be directly applied in future similar queries
Generalizable: Useful beyond this specific question

Do not save: raw facts, one-off answers, summaries, speculation, or anything unlikely to recur.

Each learning is structured:

{
    "title": "ETF comparison checklist",
    "context": "When comparing similar ETFs for investment decisions",
    "learning": "Always check both expense ratio AND tracking error. Low expense ratio with high tracking error can cost more than a slightly more expensive fund with tight tracking.",
    "confidence": "high",
    "type": "heuristic",
    "created_at": "2025-12-17T10:30:00Z"
}

Most tasks will not produce a learning. That's expected.

7. How to run your own Self-Learning Agent

I'm providing cookbooks for running your own self-learning agent, built using:

FastAPI application for running the agent
Postgres database for storing sessions, memory, and knowledge

Here's the link to the code.

You can wrap this up in a container and deploy it to Railway. Here's a sample repository you can use.

Steps to run your own Self-Learning Agent

1. Clone the repo

git clone https://github.com/agno-agi/agno.git
cd agno

2. Create and activate a virtual environment

uv venv .gemini-agents --python 3.12
source .gemini-agents/bin/activate

3. Install dependencies

uv pip install -r cookbook/02_examples/04_gemini/requirements.txt

4. Set environment variables

# Required for Gemini models
export GOOGLE_API_KEY=your-google-api-key

# Required for agents using parallel search
export PARALLEL_API_KEY=your-parallel-api-key

5. Run Postgres with PgVector

Postgres stores agent sessions, memory, knowledge, and state. Install Docker Desktop and run:

./cookbook/scripts/run_pgvector.sh

Or run directly:

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql \
  -v pgvolume:/var/lib/postgresql \
  -p 5532:5432 \
  --name pgvector \
  agnohq/pgvector:18

6. Run the Agent OS

Agno provides a web interface for interacting with agents. Start the server:

python cookbook/02_examples/04_gemini/run.py

Then visit os.agno.com and add http://localhost:7777 as an endpoint.

8. Why this pattern works

This approach works because it separates concerns that are usually conflated:

Concern	Traditional	GPU Poor
Reasoning	Model	Model (unchanged)
Learning	Model weights	Knowledge base
Memory	Context window	Persistent storage

Benefits:

Auditable — You can see exactly what the agent "learned"
Reversible — Delete a bad learning, system improves
Fast feedback — No training cycles, immediate improvement
No forgetting — New learnings don't overwrite capabilities

The pattern generalizes beyond research. Use it for:

Market analysis
Competitive intelligence
Technical support
Decision logging
Policy tracking

Anywhere beliefs evolve, learnings beat stateless answers.

Thank you for reading! Feel free to reach out on X if you have questions or feedback.

Table of Contents