Swarm Intelligence Simulation — From 10-Day Build to Million-Agent Platform
What if you could simulate exactly how public opinion will respond to your announcement — before you publish a single word? MiroFish makes that possible by running up to one million LLM-backed agents through realistic social interactions, producing probability-weighted forecasts grounded in the actual content of your documents. Built in 10 days by a 20-year-old Beijing undergraduate, it hit #1 on GitHub above repositories from OpenAI, Google, and Microsoft, attracted $4.1 million in seed investment within 24 hours, and introduced a new generation of practitioners to swarm intelligence simulation.
This complete three-part guide is the definitive technical reference for MiroFish — from the origin story and core concepts through hands-on installation, ten end-to-end use case walkthroughs, and production-grade enterprise deployment on AWS, Azure, and GCP. Whether you are a data engineer evaluating the platform, a researcher designing simulations, or an enterprise architect planning a multi-tenant deployment, this guide covers every layer of the stack with precision.
What Is MiroFish? Core Concepts and the Swarm Intelligence Paradigm
Traditional forecasting models treat the world as a mathematical equation — correlating past numeric patterns to predict future values. MiroFish treats the world as a society and simulates it. The key insight is that the most important questions in business, finance, policy, and communications are not primarily numeric: they are fundamentally social. How will public opinion shift after a press release? How will the market react when earnings drop below guidance? What narratives will dominate in the first 72 hours after a product launch?
MiroFish answers these questions through agent-based modelling powered by large language models. Each agent receives a unique biography, a knowledge-graph-derived stance on the seed document's subject matter, long-term memory, and a full social network. Agents interact across dual platforms — a Twitter-like environment for rapid sentiment cascades and a Reddit-like environment for deliberative threaded discussion — simultaneously, with the same population. This dual-platform design is a natural A/B experiment built into every simulation: convergent results indicate high-confidence signal; divergent results reveal platform-specific artifacts rather than genuine emergent dynamics.
MiroFish represents the third generation of agent-based modelling: beyond classical rule-based ABM (1940s–2010s) and early LLM-backed agents (2020–2024), MiroFish adds GraphRAG-grounded personas, dual-platform parallel simulation, and million-agent scale through the OASIS framework developed by the CAMEL-AI research community.
The OASIS Framework — The Scientific Engine
OASIS — Open Agent Social Interaction Simulations — is a peer-reviewed, open-source simulation framework developed by CAMEL-AI. Its provenance is critical: the simulation mechanics have been validated against documented social phenomena in academic literature, including information cascades, false news propagation, and polarisation effects. MiroFish did not build a cool demo on top of a hobby project — it assembled validated scientific infrastructure into an accessible product.
OASIS supports 23 distinct social actions spanning content creation, network management, discovery, and moderation — giving agents a rich behavioural surface that produces genuine social dynamics rather than scripted exchanges. The architecture supports up to one million agents through three key mechanisms: a Scalable Inferencer that batches LLM calls across GPUs; an Asynchronous Time Engine that activates agents on staggered schedules, creating realistic temporal dynamics; and a Recommendation System that mimics algorithmic feed dynamics, including filter bubbles and echo chambers.
The Six-Stage Pipeline — From Document to Forecast
Cloud-First vs Privacy-First — Choosing Your Deployment Path
MiroFish offers two distinct deployment paths with fundamentally different data residency profiles. The original MiroFish uses Zep Cloud for knowledge graph storage and any OpenAI-compatible API for LLM inference — fast to set up, lower hardware requirements, but unsuitable for any confidential, regulated, or pre-publication material.
MiroFish-Offline eliminates all cloud dependencies: the knowledge graph runs on local Neo4j, LLM inference runs through local Ollama serving Qwen2.5 models, and the entire English-translated interface runs locally. Your seed documents never leave your network. This is the only acceptable path for internal strategy documents, confidential financial data, pre-publication research, or any material subject to GDPR or HIPAA requirements. The offline fork was itself built in a single Claude Code session — a direct demonstration of the super-individual development model that created MiroFish itself.
Ten End-to-End Use Case Walkthroughs
Enterprise Deployment — AWS, Azure, and GCP
The guide covers production-grade MiroFish deployment across all three major cloud providers. The AWS reference architecture (eu-central-1) separates compute, storage, and graph layers: ECS Fargate for the Flask backend, EC2 g4dn.xlarge GPU instances for Ollama inference, Neo4j CE on EC2 r6i for the knowledge graph, CloudFront and S3 for the Vue 3 frontend, and AWS Secrets Manager for credential management — all within a VPC with private subnets. Terraform resource snippets are provided for each component.
Multi-tenant deployments isolate each tenant with dedicated Neo4j database namespaces, per-tenant S3 prefixes with IAM resource-based policies, and tenant-tagged simulation results with row-level security at the API layer. Custom CloudWatch metrics track active agents, hallucination risk scores, and token consumption per simulation, with alerting thresholds for cost gates and cascade detection.
Topics Covered in This Guide
- Origin Story & Super-Individual Theory — how BaiFu built MiroFish in 10 days, Chen Tianqiao's $4.1M investment thesis, and the vibe-coding methodology that made it possible
- OASIS Framework & Scale Architecture — peer-reviewed simulation engine, 23 social actions, million-agent infrastructure, asynchronous time engine, and recommender system dynamics
- Six-Stage Pipeline in Depth — GraphRAG document ingestion, LLM agent generation from knowledge graphs, dual-platform OASIS simulation, God's Eye variable injection, and the ReportAgent forecast system
- Cloud vs Offline Stack — Zep Cloud original vs MiroFish-Offline with local Neo4j, Ollama, Docker Compose setup, manual installation, verification, and connecting any remote LLM provider
- 10 End-to-End Use Cases — complete step-by-step walkthroughs for PR crisis, financial sentiment, Polymarket trading bot, policy drafting, product launch, literary completion, election dynamics, M&A rumour, public health, and strategy stress-testing
- Hallucination Mitigation — the Woozle Effect cascade mechanics, five-step propagation path, memory provenance tagging, sentinel agents, and dual-environment divergence analysis as a built-in filter
- Enterprise Deployment & Governance — AWS/Azure/GCP architectures, Terraform snippets, multi-tenancy, data classification, audit logging, retention policies, and CloudWatch observability
- Honest Assessment & 18-Month Roadmap — genuine strengths, current limitations including the absence of validation benchmarks, and the near-term development trajectory for MiroFish
Frequently Asked Questions
Brief Summary
MiroFish is the AI project that shocked the world — built in 10 days by a 20-year-old Beijing undergraduate, it hit #1 on GitHub above OpenAI and Google, attracted $4.1 million in 24 hours, and introduced swarm intelligence simulation to a global audience. This complete three-part guide covers everything from the origin story to million-agent enterprise deployments.
The guide dives deep into MiroFish's six-stage pipeline — GraphRAG document ingestion, LLM-backed agent generation, dual-platform OASIS simulation (Twitter-like and Reddit-like simultaneously), God's Eye variable injection, and the ReportAgent forecast system. Ten end-to-end use case walkthroughs cover PR crisis testing, financial market sentiment, Polymarket trading bot integration, policy drafting, and internal strategy stress-testing.
You will leave with a complete mental model of swarm intelligence simulation, working installation instructions for both cloud and fully-local privacy-first deployments, practical cost optimisation strategies, AWS/Azure/GCP enterprise deployment architectures, and an honest assessment of MiroFish's real strengths and current limitations.
Extended Summary
What happens when a 20-year-old developer builds in 10 days what traditionally required a funded team of engineers for months — and the world notices within 24 hours? MiroFish answers that question and then raises the stakes: it is a fully open-source swarm intelligence platform that simulates up to one million LLM-backed agents on real-world social dynamics, and this guide is the complete technical reference for understanding, installing, and deploying it.
Part 1 traces MiroFish from its origins — Guo Hangjiang's BettaFish predecessor, Chen Tianqiao's super-individual theory, the vibe-coding development process — through the core concepts of swarm intelligence and agent-based modelling, the OASIS framework with its 23 social actions and million-agent scale architecture, the six-stage processing pipeline from document ingestion to interactive ReportAgent, and the complete technology stack for both the cloud-first original and the privacy-first MiroFish-Offline fork running on local Neo4j and Ollama.
Part 2 is fully hands-on. It covers system requirements for cloud and offline variants, Docker Compose one-command setup, manual installation of Neo4j and Ollama, verification procedures, and connecting any OpenAI-compatible LLM provider. The centrepiece is ten complete end-to-end use case walkthroughs — PR crisis simulation, financial market sentiment analysis, Polymarket trading bot integration, policy draft reaction forecasting, product launch A/B testing, literary completion, election dynamics, M&A rumour propagation, public health crisis communication, and confidential strategy document stress-testing.
Part 3 addresses the most critical engineering challenges in multi-agent simulation: the Woozle Effect hallucination cascade and three practical mitigations (memory provenance tracking, sentinel agents, and dual-environment divergence analysis). It includes a cost estimation calculator, hybrid rule-based/LLM architecture for 80% cost reduction, GPU optimisation settings, and complete AWS production deployment with Terraform snippets. Azure and GCP equivalents, multi-tenant architecture, and custom CloudWatch observability metrics round out the enterprise deployment chapter.
The guide concludes with an honest, benchmark-grounded assessment of where MiroFish genuinely excels — scenario stress-testing at zero real-world risk, counterfactual exploration through God's Eye injection, and knowledge-grounded agent generation via GraphRAG — and where it currently falls short, including the absence of published real-world validation studies and the unresolved hallucination cascade problem at scale.