MiroFish — Swarm Intelligence Complete Guide

Swarm Intelligence Simulation — From 10-Day Build to Million-Agent Platform

What if you could simulate exactly how public opinion will respond to your announcement — before you publish a single word? MiroFish makes that possible by running up to one million LLM-backed agents through realistic social interactions, producing probability-weighted forecasts grounded in the actual content of your documents. Built in 10 days by a 20-year-old Beijing undergraduate, it hit #1 on GitHub above repositories from OpenAI, Google, and Microsoft, attracted $4.1 million in seed investment within 24 hours, and introduced a new generation of practitioners to swarm intelligence simulation.

This complete three-part guide is the definitive technical reference for MiroFish — from the origin story and core concepts through hands-on installation, ten end-to-end use case walkthroughs, and production-grade enterprise deployment on AWS, Azure, and GCP. Whether you are a data engineer evaluating the platform, a researcher designing simulations, or an enterprise architect planning a multi-tenant deployment, this guide covers every layer of the stack with precision.

Pages

Chapters

Use Cases

Social Actions

What Is MiroFish? Core Concepts and the Swarm Intelligence Paradigm

Traditional forecasting models treat the world as a mathematical equation — correlating past numeric patterns to predict future values. MiroFish treats the world as a society and simulates it. The key insight is that the most important questions in business, finance, policy, and communications are not primarily numeric: they are fundamentally social. How will public opinion shift after a press release? How will the market react when earnings drop below guidance? What narratives will dominate in the first 72 hours after a product launch?

MiroFish answers these questions through agent-based modelling powered by large language models. Each agent receives a unique biography, a knowledge-graph-derived stance on the seed document's subject matter, long-term memory, and a full social network. Agents interact across dual platforms — a Twitter-like environment for rapid sentiment cascades and a Reddit-like environment for deliberative threaded discussion — simultaneously, with the same population. This dual-platform design is a natural A/B experiment built into every simulation: convergent results indicate high-confidence signal; divergent results reveal platform-specific artifacts rather than genuine emergent dynamics.

MiroFish represents the third generation of agent-based modelling: beyond classical rule-based ABM (1940s–2010s) and early LLM-backed agents (2020–2024), MiroFish adds GraphRAG-grounded personas, dual-platform parallel simulation, and million-agent scale through the OASIS framework developed by the CAMEL-AI research community.

The OASIS Framework — The Scientific Engine

OASIS — Open Agent Social Interaction Simulations — is a peer-reviewed, open-source simulation framework developed by CAMEL-AI. Its provenance is critical: the simulation mechanics have been validated against documented social phenomena in academic literature, including information cascades, false news propagation, and polarisation effects. MiroFish did not build a cool demo on top of a hobby project — it assembled validated scientific infrastructure into an accessible product.

OASIS supports 23 distinct social actions spanning content creation, network management, discovery, and moderation — giving agents a rich behavioural surface that produces genuine social dynamics rather than scripted exchanges. The architecture supports up to one million agents through three key mechanisms: a Scalable Inferencer that batches LLM calls across GPUs; an Asynchronous Time Engine that activates agents on staggered schedules, creating realistic temporal dynamics; and a Recommendation System that mimics algorithmic feed dynamics, including filter bubbles and echo chambers.

The Six-Stage Pipeline — From Document to Forecast

Stage 1 — Document Ingestion & GraphRAG

Processes any seed document through Named Entity Recognition, relationship extraction, semantic clustering, and graph construction into a structured knowledge base.

Stage 2 — Agent Generation

Automatically generates agent personas with biographies, stances, long-term memory, social connections, and behavioural parameters derived from the knowledge graph.

Stage 3 — Environment Configuration

Sets simulation parameters: duration, platform mix, information seeding, algorithmic feed bias, hallucination guard level, and interaction topology.

Stage 4 — Dual-Platform Simulation

Runs the same agent population simultaneously on a Twitter-like (rapid cascade) and Reddit-like (deliberative discussion) environment via OASIS.

Stage 5 — God's Eye Injection

Injects new variables into the live simulation at any point — CEO resignations, competitor announcements, breaking news — causing the entire population to recalibrate in real time.

Stage 6 — ReportAgent & Interactive World

Produces structured probability-weighted forecasts with faction maps, sentiment timelines, and hallucination risk flags — plus the ability to query individual agents directly.

Cloud-First vs Privacy-First — Choosing Your Deployment Path

MiroFish offers two distinct deployment paths with fundamentally different data residency profiles. The original MiroFish uses Zep Cloud for knowledge graph storage and any OpenAI-compatible API for LLM inference — fast to set up, lower hardware requirements, but unsuitable for any confidential, regulated, or pre-publication material.

MiroFish-Offline eliminates all cloud dependencies: the knowledge graph runs on local Neo4j, LLM inference runs through local Ollama serving Qwen2.5 models, and the entire English-translated interface runs locally. Your seed documents never leave your network. This is the only acceptable path for internal strategy documents, confidential financial data, pre-publication research, or any material subject to GDPR or HIPAA requirements. The offline fork was itself built in a single Claude Code session — a direct demonstration of the super-individual development model that created MiroFish itself.

Ten End-to-End Use Case Walkthroughs

UC-01 PR Crisis Simulation

Pre-test any press release across 500 simulated journalists, employees, and public — see which narratives dominate before publishing.

UC-02 Financial Market Sentiment

Simulate retail investor, analyst, and short-seller reactions to earnings reports — with probability-weighted outcome distributions.

UC-03 Polymarket Trading Bot

Architecture for connecting MiroFish simulations to Polymarket CLOB API — including the 8% edge threshold and confidence filter used in a reported $4,266 profit run.

UC-04 Policy Draft Forecasting

Simulate 4-week public consultation dynamics across stakeholder archetypes — SMEs, civil society, legal professionals, general public.

UC-05 Product Launch A/B Test

A/B test messaging before ad spend — compare two positioning variants and inject negative reviews or competitor responses mid-simulation.

UC-06 Literary Completion

Feed any novel or text with an unresolved narrative; character-agents play out the ending through emergent social dynamics, not AI-generated prose.

UC-07 Election Sentiment Dynamics

Simulate how investigative reports, candidate denials, and document leaks cascade through voter segments across a compressed election timeline.

UC-08 M&A Rumour Propagation

Model how deal rumours propagate through investor archetypes — and whether CEO denials amplify or reduce speculation.

UC-09 Public Health Crisis Communication

Test communication timing, messaging style, and misinformation counter-strategies across demographically diverse agent populations.

UC-10 Strategy Document Stress Test

Simulate the leak scenario for your 5-year strategic plan — offline only, zero cloud dependencies, revealing investor skepticism and competitor intelligence gaps.

Enterprise Deployment — AWS, Azure, and GCP

The guide covers production-grade MiroFish deployment across all three major cloud providers. The AWS reference architecture (eu-central-1) separates compute, storage, and graph layers: ECS Fargate for the Flask backend, EC2 g4dn.xlarge GPU instances for Ollama inference, Neo4j CE on EC2 r6i for the knowledge graph, CloudFront and S3 for the Vue 3 frontend, and AWS Secrets Manager for credential management — all within a VPC with private subnets. Terraform resource snippets are provided for each component.

Multi-tenant deployments isolate each tenant with dedicated Neo4j database namespaces, per-tenant S3 prefixes with IAM resource-based policies, and tenant-tagged simulation results with row-level security at the API layer. Custom CloudWatch metrics track active agents, hallucination risk scores, and token consumption per simulation, with alerting thresholds for cost gates and cascade detection.

    Cost reality check: A standard 500-agent, 50-round simulation costs approximately $30 on GPT-4o via cloud API, or near-zero locally with Ollama and Qwen2.5-32B. A hybrid rule-based architecture — routing 80% of agent decisions to lightweight scoring functions and only 20% to full LLM calls — reduces cloud costs by roughly 80% with minimal quality impact. The guide includes a Python cost estimator function and GPU optimisation settings for local deployments.
  

Topics Covered in This Guide

Origin Story & Super-Individual Theory — how BaiFu built MiroFish in 10 days, Chen Tianqiao's $4.1M investment thesis, and the vibe-coding methodology that made it possible
OASIS Framework & Scale Architecture — peer-reviewed simulation engine, 23 social actions, million-agent infrastructure, asynchronous time engine, and recommender system dynamics
Six-Stage Pipeline in Depth — GraphRAG document ingestion, LLM agent generation from knowledge graphs, dual-platform OASIS simulation, God's Eye variable injection, and the ReportAgent forecast system
Cloud vs Offline Stack — Zep Cloud original vs MiroFish-Offline with local Neo4j, Ollama, Docker Compose setup, manual installation, verification, and connecting any remote LLM provider
10 End-to-End Use Cases — complete step-by-step walkthroughs for PR crisis, financial sentiment, Polymarket trading bot, policy drafting, product launch, literary completion, election dynamics, M&A rumour, public health, and strategy stress-testing
Hallucination Mitigation — the Woozle Effect cascade mechanics, five-step propagation path, memory provenance tagging, sentinel agents, and dual-environment divergence analysis as a built-in filter
Enterprise Deployment & Governance — AWS/Azure/GCP architectures, Terraform snippets, multi-tenancy, data classification, audit logging, retention policies, and CloudWatch observability
Honest Assessment & 18-Month Roadmap — genuine strengths, current limitations including the absence of validation benchmarks, and the near-term development trajectory for MiroFish

Read the Full Guide + Download Free Sample

46 pages · Instant PDF download · Available in the SimuPro Knowledge Store

View Guide Summary & Sample on SimuPro → 📋 Browse Complete Guide Index →

Frequently Asked Questions

What is MiroFish and how does it work?

MiroFish is an open-source swarm intelligence platform that simulates up to one million LLM-backed agents interacting on social media-style environments. It processes any seed document through a six-stage pipeline: GraphRAG ingestion builds a knowledge graph, agents are generated with biographies and stances drawn from that graph, and then the entire population runs simultaneously on Twitter-like and Reddit-like platforms via the OASIS framework. The result is a probability-weighted forecast of how real social dynamics would unfold around the document's subject matter.

What is the difference between MiroFish original and MiroFish-Offline?

The original MiroFish uses Zep Cloud for knowledge graph storage and any OpenAI-compatible cloud API for LLM inference, making setup faster but unsuitable for confidential data. MiroFish-Offline replaces Zep Cloud with a local Neo4j database and uses Ollama to serve models locally — zero cloud dependencies after initial model download. The offline fork also translates the entire interface from Chinese to English. For any document classified as confidential, internal, or regulated, only MiroFish-Offline should be used.

What is the Woozle Effect and why does it matter for swarm simulations?

The Woozle Effect describes how a claim gains credibility simply through repetition, even when all instances trace back to a single unverified source. In MiroFish simulations this manifests as hallucination cascade: one agent hallucinates a fact, posts it, and other agents absorb and repeat it until the entire population is reasoning from information that never existed in the seed document. Research indicates over 10–20% of agents can be misled per discussion round, compounding exponentially. The guide covers three practical mitigations: memory provenance tagging, sentinel agents, and dual-environment divergence analysis.

What use cases is MiroFish best suited for?

MiroFish excels at scenario stress-testing before real-world decisions: PR crisis simulation before publishing an announcement, financial market sentiment analysis around earnings events, policy draft public reaction forecasting, product launch A/B positioning tests, and confidential strategy document leak simulations. The guide walks through ten complete end-to-end use cases with step-by-step configuration instructions. It is not a deterministic forecasting tool — outputs are probability distributions over narrative trajectories, not price targets or vote counts.

How much does it cost to run a MiroFish simulation?

With cloud APIs, a standard run of 500 agents over 50 rounds using GPT-4o costs approximately $30, using a hybrid architecture that routes roughly 80% of agent decisions to lightweight rule logic and only 20% to full LLM calls. Local deployments with Ollama and Qwen2.5-32B reduce ongoing cost to near-zero — hardware and power only. The guide includes a Python cost estimation function and GPU optimisation settings including flash attention, parallel inference, and batch scheduling to maximise signal-to-cost ratio.

Brief Summary

MiroFish is the AI project that shocked the world — built in 10 days by a 20-year-old Beijing undergraduate, it hit #1 on GitHub above OpenAI and Google, attracted $4.1 million in 24 hours, and introduced swarm intelligence simulation to a global audience. This complete three-part guide covers everything from the origin story to million-agent enterprise deployments.

The guide dives deep into MiroFish's six-stage pipeline — GraphRAG document ingestion, LLM-backed agent generation, dual-platform OASIS simulation (Twitter-like and Reddit-like simultaneously), God's Eye variable injection, and the ReportAgent forecast system. Ten end-to-end use case walkthroughs cover PR crisis testing, financial market sentiment, Polymarket trading bot integration, policy drafting, and internal strategy stress-testing.

You will leave with a complete mental model of swarm intelligence simulation, working installation instructions for both cloud and fully-local privacy-first deployments, practical cost optimisation strategies, AWS/Azure/GCP enterprise deployment architectures, and an honest assessment of MiroFish's real strengths and current limitations.

Extended Summary

What happens when a 20-year-old developer builds in 10 days what traditionally required a funded team of engineers for months — and the world notices within 24 hours? MiroFish answers that question and then raises the stakes: it is a fully open-source swarm intelligence platform that simulates up to one million LLM-backed agents on real-world social dynamics, and this guide is the complete technical reference for understanding, installing, and deploying it.

Part 1 traces MiroFish from its origins — Guo Hangjiang's BettaFish predecessor, Chen Tianqiao's super-individual theory, the vibe-coding development process — through the core concepts of swarm intelligence and agent-based modelling, the OASIS framework with its 23 social actions and million-agent scale architecture, the six-stage processing pipeline from document ingestion to interactive ReportAgent, and the complete technology stack for both the cloud-first original and the privacy-first MiroFish-Offline fork running on local Neo4j and Ollama.

Part 2 is fully hands-on. It covers system requirements for cloud and offline variants, Docker Compose one-command setup, manual installation of Neo4j and Ollama, verification procedures, and connecting any OpenAI-compatible LLM provider. The centrepiece is ten complete end-to-end use case walkthroughs — PR crisis simulation, financial market sentiment analysis, Polymarket trading bot integration, policy draft reaction forecasting, product launch A/B testing, literary completion, election dynamics, M&A rumour propagation, public health crisis communication, and confidential strategy document stress-testing.

Part 3 addresses the most critical engineering challenges in multi-agent simulation: the Woozle Effect hallucination cascade and three practical mitigations (memory provenance tracking, sentinel agents, and dual-environment divergence analysis). It includes a cost estimation calculator, hybrid rule-based/LLM architecture for 80% cost reduction, GPU optimisation settings, and complete AWS production deployment with Terraform snippets. Azure and GCP equivalents, multi-tenant architecture, and custom CloudWatch observability metrics round out the enterprise deployment chapter.

The guide concludes with an honest, benchmark-grounded assessment of where MiroFish genuinely excels — scenario stress-testing at zero real-world risk, counterfactual exploration through God's Eye injection, and knowledge-grounded agent generation via GraphRAG — and where it currently falls short, including the absence of published real-world validation studies and the unresolved hallucination cascade problem at scale.

SimuPro Data Solutions

Cloud Data Engineering & AI Consultancy · AWS · Azure · GCP · Databricks · Ysselsteyn, Netherlands · simupro.nl

SimuPro is your end-to-end cloud data solutions partner — from in-depth consultancy (research, architecture design, platform selection, optimisation, management, team support) to tailor-made development (proof-of-concept, build, test, deploy to production, scale, automate, extend). We engineer robust data platforms on AWS, Azure, Databricks & GCP — covering data migration, big data engineering, BI & analytics, and ML models, AI agents & intelligent automation — secure, scalable, and tailored to your exact business goals.

From Data to Valuable Insights — Proven Impact that Drives Business Growth

Data-Driven AI-Powered Validated Results Confident Decisions Smart Outcomes

Related Guides in the SimuPro Knowledge Store

SimuPro Data Solutions — Cloud Data Engineering & AI Consultancy

Expert PDF guides · End-to-end consultancy · AWS · Azure · Databricks · GCP

Visit simupro.nl →

📋 Browse All Guides — Complete Index →