AI Agents & Self-Improving Systems

PROMETHEUS — Complete Framework Reference

📄 64 pages
📅 Published 20 April 2026
✍️ SimuPro Data Solutions
View Guide Summary & Sample on SimuPro → 📋 Browse Complete Guide Index →

What This Guide Covers

PROMETHEUS is a production-ready framework for building AI agent populations that evolve, compete, and improve themselves — autonomously, continuously, and safely. This 64-page complete reference covers everything from the intellectual foundations of recursive self-improvement and AGI definitions, through the full engineering specification of all six PROMETHEUS subsystems, to a hands-on step-by-step Ubuntu Linux implementation guide that takes you from a clean install to a running evolutionary session.

The guide is structured in three parts. Part I establishes the theoretical context — the AGI capability taxonomy, the self-improvement research landscape from AlphaEvolve to DGM-Hyperagents, and the PROMETHEUS three-tier architecture with its six design principles. Part II delivers complete engineering specifications for every subsystem: Population Management with island topology and six mutation operators, the six-dimensional Fitness Evaluation Engine, Meta-Agent Orchestration, the four-layer Knowledge Repository, the Safety Guardian with its cryptographically protected Constitutional Constraints, and State Persistence with full session resume. Part III is a complete Ubuntu 22.04 implementation walkthrough — six prerequisite layers, full Python code for every core module, CLI reference, monitoring dashboard, and performance tuning guide for CPU-only, single-GPU, and API-backed deployments. Two appendices provide systematic alternative technology reviews and ten radically different self-improvement paradigms for researchers who want to push beyond the baseline.

64
Pages
16
Chapters
14
Framework Components
6
Fitness Dimensions

The Self-Improvement Problem — And Why It Matters Now

Every AI system ever built — no matter how capable — was made better by human engineers writing new architectures, curating new datasets, and engineering new training regimes. The next leap toward Artificial General Intelligence requires systems that can drive their own improvement cycle autonomously and continuously. In 2025 alone, Google DeepMind's AlphaEvolve improved its own training pipeline, Meta's DGM-Hyperagents demonstrated cross-domain self-improvement transfer, and a dozen academic groups published frameworks for recursive self-refinement. The moment is exactly right for a unified, production-ready framework built on open principles.

PROMETHEUS operationalizes recursive self-improvement through an evolutionary population architecture. Rather than optimizing a single model through gradient descent, it maintains a diverse community of AI agent configurations that compete, reproduce, and evolve — guided by multi-dimensional fitness evaluation, safety-constrained self-modification, and a layered memory architecture that accumulates strategic wisdom across sessions. The framework is hardware-agnostic by design: it delivers meaningful improvement on a CPU-only laptop and scales gracefully to multi-GPU research clusters.

The Research Landscape PROMETHEUS Builds On

The guide maps the full 2024–2026 self-improving AI research landscape in depth: AlphaEvolve (DeepMind, May 2025), which discovered faster matrix multiplication algorithms and improved its own training pipeline; the Gödel Agent (ACL 2025), demonstrating continuous self-improvement on mathematical reasoning through recursive self-modification; SEAL (NeurIPS 2025), showing that a model's own introspective outputs can drive genuine weight-level improvement; SICA, demonstrating safe agent-script self-modification; and DGM-Hyperagents (Meta, March 2026), which made the meta-improvement mechanism itself editable and achieved compelling cross-domain transfer. PROMETHEUS synthesizes the best ideas from all of these into a single coherent, deployable framework.

The Three-Tier Architecture

PROMETHEUS is organized into three architectural tiers with well-defined interfaces, so individual components can be replaced or upgraded as technology advances. The Execution Layer houses base agents, tool interfaces, sandbox runtimes, evaluation probes, and LLM backends — the ground floor responsible for actually running agent code and reporting results, and the most safety-critical layer in the stack. The Orchestration Layer contains the Meta-Agent Controller, Task Scheduler, Knowledge Repository, Safety Guardian, and State Manager — the brain of PROMETHEUS that coordinates resources, screens safety, and manages persistence. The Evolution Layer operates at the level of agent populations and evolutionary dynamics — Population Manager, Fitness Evaluator, Innovation Engine, and Lineage Tracker.

Safety Architectural Principle: The Safety Guardian is isolated from the evolutionary process. Its Constitutional Constraints specification is loaded from a read-only, cryptographically signed file at startup. No component — not even the Meta-Agent Controller — can modify, bypass, or disable it at runtime. Attempted circumvention triggers an immediate full system halt and full audit log entry. Self-improvement without safety is not progress — it is a countdown.

The 14 PROMETHEUS Framework Components

Population Manager
Maintains a diverse island-based population of agent configurations, managing migration, elite retention, and the new/elite/crossover composition ratio.
Fitness Evaluator
Scores each candidate agent on six orthogonal dimensions — accuracy, reasoning quality, efficiency, robustness, alignment, and novelty — aggregated to a scalar fitness score.
Innovation Engine
Generates novel candidate agent configurations using six mutation operators and three crossover strategies, with probabilities that adapt based on historical productivity.
Lineage Tracker
Records the complete genealogy of every agent ever evaluated — parent IDs, mutation types, generation of origin, fitness trajectory — stored as Apache Parquet for efficient querying.
Meta-Agent Controller
Coordinates each generation cycle, monitors resource consumption, detects stagnation and triggers corrective actions, and self-tunes evolutionary hyperparameters.
Task Scheduler
Maintains a priority queue of evaluation tasks dispatched to available workers, with priority computed from candidate promise, evaluation cost, diversity value, and staleness.
Knowledge Repository
Four-layer memory architecture (Working / Episodic / Semantic / Procedural) enabling PROMETHEUS to become smarter the longer it runs, across sessions.
Safety Guardian
Three-stage evaluation pipeline (Constitutional Check → Safety Scoring → Sandbox Testing) screening every proposed modification before deployment — architecturally immutable.
State Manager
Writes four atomic checkpoint files every N generations (.state.json, .lineage.parquet, .knowledge.db, .metrics.csv) enabling full deterministic session resume.
Base Agents
Configurable Python agent class executing tasks via LLM + tools, with parameters covering system prompt, task template, strategy, tools, temperature, and memory strategy.
Tool Interface
Provides sandboxed access to external tools and APIs with explicit capability declarations, audit logging, and strict resource limits.
Sandbox Runtime
Docker-based isolation environment for agent code execution, with configurable memory, CPU, disk, and network restrictions preventing escape and resource exhaustion.
Evaluator Probes
Lightweight automated assessment runners providing fast intermediate-generation feedback across 500 development tasks in eight benchmark categories.
LLM Backend
Abstraction layer supporting five interchangeable backends: Ollama (local/free), vLLM (local/high-throughput), OpenAI API, Anthropic API, and Hugging Face.

Topics Covered in This Guide

Read the Full Guide + Download Free Sample

64 pages · Instant PDF download · Available in the SimuPro Knowledge Store

View Guide Summary & Sample on SimuPro → 📋 Browse Complete Guide Index →

Frequently Asked Questions

What hardware do I need to run PROMETHEUS?
PROMETHEUS is designed to run on any modern Ubuntu 22.04 LTS system — a laptop with 8 CPU cores, 32 GB RAM, and no GPU is sufficient for productive experimentation. For faster evolution cycles, an NVIDIA RTX 4090 with 64–128 GB RAM is recommended. The guide covers all three tiers — CPU-only, single-GPU, and multi-GPU cluster — with specific configuration recommendations for each, including exact pip install commands, CUDA setup, and config.yaml templates.
How does PROMETHEUS ensure that the evolutionary process remains safe?
PROMETHEUS uses a Safety Guardian component that is architecturally isolated from the evolutionary process. Its Constitutional Constraints specification is loaded from a read-only, cryptographically signed file at startup — no component can modify, bypass, or disable it at runtime. Every proposed modification must pass a three-stage evaluation pipeline: a fast symbolic rule check, a numeric safety scoring step, and sandboxed execution in an isolated container. Any attempted circumvention triggers an immediate full system halt and audit log entry.
Can PROMETHEUS use local LLMs, or does it require an API key?
PROMETHEUS supports five LLM backends: Ollama (fully local and free, recommended for CPU-only or GPU-equipped local deployments), vLLM (local with 4–8x higher throughput than Ollama for research deployments), OpenAI API, Anthropic API, and Hugging Face. The guide provides detailed setup instructions, performance benchmarks, and cost trade-off analysis for all five backends. For zero-cost runs on a laptop, Ollama with llama3.1:8b is the recommended starting point.
What makes PROMETHEUS different from LangGraph, AutoGen, or CrewAI?
Most agent frameworks are orchestration tools — they manage how a fixed set of agents communicate and execute tasks. PROMETHEUS is an evolutionary framework — it maintains a population of agent configurations that compete, reproduce, and improve across generations. It evolves agents simultaneously at prompt-level, strategy-level, and code-level. The guide provides an honest, systematic comparison with all major alternatives including DSPy, Reflexion/Self-Refine, MAML, and AutoML, explaining when each is a better choice than PROMETHEUS.
How does cross-session learning work — does PROMETHEUS remember previous runs?
Yes. PROMETHEUS uses a four-layer memory architecture. At the end of each run, the Knowledge Extraction Agent distils strategy patterns from the episodic log into the Semantic Memory layer. When a new run is started with --resume-from, it seeds the starting population from the previous run's Procedural Archive (all-time best agents), loads Semantic Memory patterns into the Innovation Engine to bias mutation toward historically productive regions, and re-weights the benchmark suite based on prior performance gaps. Each new run genuinely starts smarter than the last.
How long does a typical PROMETHEUS run take, and what results can I expect?
On a CPU-only 8-core system with Ollama/8B, expect 1–2 generations per hour — a 6-hour exploratory run is practical for initial experiments. On a single RTX 4090 with Ollama/70B, 4–6 generations per hour is achievable. The guide recommends a two-phase approach: start with a 6-hour Aggressive Exploration run (high mutation rate, maximum diversity) to discover promising configuration regions, then launch a 48–72 hour Balanced Exploitation run resuming from the best exploratory results. This consistently outperforms a single long run with fixed parameters.

Brief Summary

PROMETHEUS is a complete technical blueprint for building, running, and evolving AI agent populations toward continuously improving performance — an evolutionary population architecture that combines LLM-powered mutation and crossover with multi-dimensional fitness evaluation, cryptographically protected safety constraints, and a four-layer memory system that accumulates strategic wisdom across sessions. The framework runs productively on a CPU-only Ubuntu laptop, scales to multi-GPU research clusters, and supports five interchangeable LLM backends from fully local Ollama to OpenAI and Anthropic APIs.

The guide delivers three connected layers of value: the intellectual foundation (AGI definitions, the self-improvement research landscape, the four engineering requirements pillars), the complete framework engineering specification (all six subsystems with full design rationale and configuration parameters), and a hands-on implementation guide that takes a skilled practitioner from a clean Ubuntu 22.04 install to a running PROMETHEUS production session with full monitoring, checkpointing, and result analysis.

Two extensive appendices extend the core material: Appendix A provides systematic alternative technology reviews for all ten critical PROMETHEUS components — including honest trade-off assessments of CMA-ES, Bayesian Optimization, QD Algorithms, DSPy, RLHF/DPO, LangGraph, Neo4j, Firecracker MicroVMs, and WebAssembly sandboxing. Appendix B explores ten radically alternative self-improvement paradigms for researchers seeking to push beyond conventional evolutionary optimization, from Open-Ended Co-Evolution and Morphogenetic Encoding to Curiosity-Driven Intrinsic Motivation and Surprise-Based Selection.

Extended Summary

What if you could build an AI agent system that becomes demonstrably smarter every time it runs — not because a human engineer improved it, but because it evolved itself? PROMETHEUS is a production-ready framework for exactly that: a self-evolving AI agent population architecture that combines the power of evolutionary computation with the expressive power of large language models, wrapped in a rigorous safety and evaluation architecture that ensures every improvement is traceable, reproducible, and aligned.

Part I builds the intellectual foundation that every serious practitioner needs before writing a single line of PROMETHEUS code. It traces the 80-year history of AGI research from Turing's 1950 imitation game through the deep learning revolution to the recursive self-improvement breakthroughs of 2024–2026. It establishes a rigorous five-level AGI capability taxonomy and a working definition of AGI built on four pillars — capability breadth, autonomous agency, continuous self-improvement, and aligned safety — that guide every design decision in the framework. It maps the four requirement dimensions for AGI (software architecture, hardware and compute, data and knowledge, alignment and safety) and surveys the current state of the art in self-improving AI, from AlphaEvolve's discovery of faster matrix multiplication algorithms to DGM-Hyperagents' compelling cross-domain transfer and the Gödel Agent's recursive self-modification at ACL 2025.

Part II provides the complete engineering specification for all six PROMETHEUS subsystems. The Population Manager chapter covers the four-island architecture (Exploration, Exploitation, Novelty, and Elite Hall of Fame islands), the three preset composition ratios (Aggressive Exploration 80/10/10, Balanced Default 45/35/20, Strong Convergence 15/55/30), and all six mutation operator categories with their adaptive probability schedules. The Fitness Evaluation Engine chapter specifies six orthogonal scoring dimensions with default weights summing to 1.0, an 500-task eight-category benchmark suite, and an adaptive weight learning cycle that prevents the system from over-indexing on easy-to-game metrics. The Meta-Agent Orchestration chapter documents the generation state machine, real-time resource allocation, stagnation detection with Diversification Events, and the structured JSON progress event format that powers the monitoring dashboard. The Knowledge Repository chapter details all four memory layers, the Knowledge Extraction Agent's seven analysis operations, and the full session resume protocol including how Semantic Memory patterns are used to seed mutation proposals. The Safety Guardian chapter covers the four-tier Constitutional Constraints specification (Hard Prohibitions, Soft Prohibitions, Mandatory Practices, Value Alignment Properties), the three-stage evaluation pipeline, and the adversarial robustness mechanisms including jailbreak detection and capability spike quarantine. The State Persistence chapter describes the four atomic checkpoint files, the Solution Trajectory system, and the deterministic seven-step resume sequence.

Part III delivers a complete hands-on Ubuntu Linux implementation guide, structured as a six-layer pre-production checklist that must be completed in order before a production PROMETHEUS run can be safely initiated. Layer 1 covers system prerequisites and Python environment setup with Miniforge and Conda. Layer 2 covers Docker sandbox configuration with resource limits. Layer 3 covers optional GPU/CUDA 12.x installation and vLLM setup. Layer 4 covers all five LLM backend configurations with example YAML entries. Layer 5 presents the full Python implementation of every core module — config.py with Pydantic models, base_agent.py with the AgentConfig dataclass, the six-phase evolutionary loop, the LLM backend abstraction class hierarchy, and the safety guardian. Layer 6 covers YAML configuration, benchmark task suite selection, budget and checkpoint configuration, and the validate + dry-run sequence before launch.

The appendices ensure that PROMETHEUS is not a black box but a transparent starting point for serious research. Appendix A reviews 2–4 alternative implementations for each of ten critical components, with honest advantage/disadvantage tables and explicit links showing how each alternative would interact with other PROMETHEUS components — making it a practical decision guide for teams customising the framework for their specific hardware, budget, or research objective. Appendix B provides ten radically alternative self-improvement paradigms — Open-Ended Evolution, Morphogenetic and Developmental Encoding, Artificial Life and Digital Evolution, Intrinsic Motivation, Symbiotic Co-Evolution, Quantum-Inspired Evolutionary Algorithms, Hyperdimensional Computing, Neural Cellular Automata, Chaos Theory, and Surprise-Based Selection — each with current research findings, serendipity potential ratings, and explicit links back to PROMETHEUS components they could replace or extend. The appendix closes with a synthesis describing the most powerful possible PROMETHEUS deployment: all ten paradigms running simultaneously across a large island population, contributing agents to a shared migration pool in a framework that is itself subject to evolutionary improvement.

SimuPro Data Solutions
SimuPro Data Solutions
Cloud Data Engineering & AI Consultancy  ·  AWS  ·  Azure  ·  GCP  ·  Databricks  ·  Ysselsteyn, Netherlands  ·  simupro.nl
SimuPro is your end-to-end cloud data solutions partner — from in-depth consultancy (research, architecture design, platform selection, optimization, management, team support) to tailor-made development (proof-of-concept, build, test, deploy to production, scale, automate, extend). We engineer robust data platforms on AWS, Azure, Databricks & GCP — covering data migration, big data engineering, BI & analytics, and ML models, AI agents & intelligent automation — secure, scalable, and tailored to your exact business goals.
Data-Driven AI-Powered Validated Results Confident Decisions Smart Outcomes

Related Guides in the SimuPro Knowledge Store

SimuPro Data Solutions — Cloud Data Engineering & AI Consultancy

Expert PDF guides · End-to-end consultancy · AWS · Azure · Databricks · GCP

Visit simupro.nl →
📋 Browse All Guides — Complete Index →