What This Guide Covers
From a 200-line Python language model — mathematically identical to GPT-4 and Claude Opus 4.6 — to a fully operational, regulation-compliant enterprise banking agent orchestrating five specialist sub-agents, eight deterministic tool calls, and two database writes in under two seconds: this guide delivers the complete technical journey in one definitive reference. It begins at the source code level — tokenisation, embeddings, self-attention, backpropagation, and the Adam optimiser — then climbs through the training pipeline (SFT, RLHF, Constitutional AI) to the frontier models powering today's production systems, before building an enterprise-grade agentic architecture on Claude Opus 4.6 and the Model Context Protocol.
Engineers, architects, and technical decision-makers will find everything needed to understand how modern LLMs actually work, how to build production AI agents with rigorous security and compliance, how to choose between frontier models for enterprise deployment, and how to manage costs through smart model routing and context isolation. The guide covers the full spectrum from foundational mathematics to 12-phase production rollout, with complete Python code, architecture diagrams, comparison tables, and a 40-term glossary.
From microGPT to Frontier Models — Understanding LLMs at Source-Code Level
The guide opens with Andrej Karpathy's microGPT: a fully functional GPT in under 200 lines of pure Python with zero dependencies, where every operation is exposed. You will see exactly how character-level tokenisation converts text to integers, how the embedding layer transforms those integers into dense semantic vectors, and how positional embeddings give the transformer its awareness of sequence order — the difference between "dog bites man" and "man bites dog" exists here, in the position vectors that flow into all subsequent computation.
Self-attention is dissected in full: how Query, Key, and Value projections work, why the √d_k scaling factor prevents softmax saturation, how multi-head attention runs parallel subspaces to capture different relationship types simultaneously, and how residual connections enable gradient flow in deep networks. The MLP feed-forward block, KV Cache, autoregressive generation with temperature sampling, and the complete forward pass architecture are all covered with both the mathematics and the Python implementation.
The Training Pipeline: SFT, RLHF, and Constitutional AI
From the base model trained on web-scale corpora to an aligned assistant requires three further stages: Supervised Fine-Tuning on (instruction, response) pairs, RLHF with reward model training and PPO optimisation, and — for Claude — Constitutional AI. CAI uses a written constitution to generate AI feedback on the model's own outputs (RLAIF), making alignment scalable beyond what human labelling alone can achieve and baking safety into the model's weights rather than enforcing it only at runtime. A detailed comparison table covers Claude Opus 4.6 versus GPT-4o across context window, tool protocol, deployment options, data residency, compliance certifications, and extended thinking capability.
Claude as an Agent AI — ReAct, MCP, and Multi-Agent Orchestration
The agent chapters establish the complete technical architecture for moving from a conversational model to an autonomous system. The ReAct (Reason + Act) loop — the foundational pattern for all LLM agents — is implemented via Claude's tool_use stop reason: the model returns a structured tool call, the application executes it deterministically, appends the result as a tool_result block, and the loop continues until stop_reason is end_turn. This is the same mechanism used in every production Claude agent, from simple workflows to the multi-agent banking system built later in the guide.
MCP (Model Context Protocol) is covered in depth as the solution to the tool-management problem that emerges at enterprise scale. When dozens of services need to be accessible, manually managing JSON schema definitions becomes unsustainable. An MCP server exposes tools, resources, and prompts through a standardised protocol; when tool descriptions would consume more than 10% of the context window, Claude activates MCP tool search automatically — dynamically loading only relevant tools on demand. The guide includes complete Python code for connecting to an MCP banking server with OAuth token authentication and filtered tool access.
Core Technical Components Covered
Enterprise Banking Agent — Complete Blueprint and End-to-End Example
Banking is among the most demanding domains for AI deployment: high-stakes decisions, strict regulatory requirements (GDPR, PSD2, MiFID II, Basel III, AML/KYC), real-time processing, and zero tolerance for errors. The guide provides a complete, opinionated architecture built on six core design principles: least privilege, deterministic policies first, full auditability, human-in-the-loop escalation, defence in depth, and graceful degradation. Every agent and tool operates with the minimum access required; business rules and compliance checks run as deterministic code that the LLM cannot bypass.
The complete end-to-end example traces a EUR 500 intra-account transfer through 11 stages: JWT authentication, intent classification, account resolution, explicit customer confirmation, policy engine evaluation, fraud scoring, AML screening, core banking execution, immutable audit record creation, and notification delivery — all in under two seconds. The full Python implementation is provided, including the MCP banking server, orchestrator agent, dispatch loop with policy gate and fraud check, and the structured compliance audit record satisfying data retention requirements. A policy edge-case table covers eight scenarios including velocity breaches, sanctioned counterparties, social engineering detection, and large-value step-up authentication.
Topics Covered in This Guide
- LLM Fundamentals & microGPT — tokenisation, embeddings, self-attention, backpropagation, Adam optimiser, and the complete training loop built from scratch in pure Python
- Frontier Model Architecture & Alignment — scaling from microGPT to Claude Opus 4.6 and GPT-4o via SFT, RLHF, Constitutional AI, and detailed architecture comparison tables
- Agent AI & the ReAct Loop — how Claude transitions from chatbot to autonomous agent with tool use, MCP protocol integration, and multi-agent orchestration via the Claude Agent SDK
- Enterprise Banking Agent Framework — production-grade multi-agent blueprint with least-privilege design, deterministic policy gates, AML/KYC compliance, and full auditability
- Secure Fund Transfer — End-to-End — complete 11-step execution trace with full Python code, policy enforcement, fraud scoring, edge-case handling, and structured compliance audit record
- Kimi K2 / K2.5 Deep Dive — MoE architecture, MuonClip optimizer, Agent Swarm paradigm (100 sub-agents, 1,500 tool calls), and benchmark comparison vs Claude Opus 4.6
- RAG, Embeddings & Agent Memory — vector store pipeline, semantic search, domain-specific embedding models for banking, and persistent cross-session memory architecture
- Production Ops & Model Governance — observability stack, prompt engineering best practices, cost-optimised model routing (70–80% cost reduction), and model lifecycle governance
Frequently Asked Questions
Brief Summary
From a 200-line Python language model — mathematically identical to GPT-4 and Claude Opus 4.6 — to a fully operational, regulation-compliant enterprise banking agent orchestrating five specialist sub-agents, eight deterministic tool calls, and two database writes in under two seconds: this guide delivers the complete technical journey in one definitive reference. It begins at the source-code level with tokenisation, embeddings, self-attention, backpropagation, and the Adam optimiser, then traces the full training pipeline — SFT, RLHF, Constitutional AI — to the frontier models driving production systems. The agent architecture follows: ReAct loops, Claude API tool use, MCP server integration, multi-agent orchestration, and a precise mapping of where the LLM is — and is not — used inside a real system.
You will understand exactly how transformers work at the source level, follow the training pipeline from next-token prediction through Constitutional AI, then build an enterprise agent with the Model Context Protocol and Claude Agent SDK. The banking framework applies it all: a six-principle architecture covering least privilege, deterministic policy gates, full auditability, human-in-the-loop escalation, defence in depth, and graceful degradation — with full Python code for the MCP tool layer, orchestrator agent, and a complete EUR 500 fund transfer with 11-step execution trace and structured compliance audit record.
Appendices deliver a deep technical profile of Kimi K2 and K2.5 (MoE architecture, MuonClip optimiser, Agent Swarm, benchmarks vs Claude Opus 4.6), the Claude Agent SDK and Agent Skills framework with cost analysis, RAG pipeline design with domain-specific banking embeddings, production observability metrics, prompt engineering best practices, model governance lifecycle, and a cost-optimised routing strategy that cuts LLM costs by 70–80%. A 40-term glossary and a unique appendix that maps every prompt in the document's own creation to Claude's six internal processing steps round out the reference.
Extended Summary
What if you could trace every computation inside GPT-4 or Claude Opus 4.6 from the first matrix multiply to the final token, and then immediately apply that understanding to architect a regulation-compliant enterprise banking agent that orchestrates five specialist sub-agents, eight tool calls, and two database writes in under two seconds? This guide makes that journey possible — in a single 46-page reference that starts with a 200-line Python transformer and ends with a 12-phase production rollout roadmap, with complete Python code, architecture diagrams, and comparison tables throughout. It is simultaneously an engineering deep-dive and a production playbook, designed for developers, architects, and technical leaders who want both the foundational understanding and the practical patterns to build and deploy AI agents at enterprise scale.
The foundation is microGPT: Andrej Karpathy's fully functional GPT in pure Python with zero dependencies, which this guide disassembles step by step. You will see exactly how tokenisation converts text to integers, how embedding layers add semantic meaning and positional awareness, how multi-head self-attention computes relevance-weighted information across every token pair in parallel, how the feed-forward block stores factual associations as a key-value memory, and how the Adam optimiser applies backpropagation gradients to produce a model that learns. This is the same mathematics inside GPT-4 and Claude — only the scale differs. Comparison tables cover every major frontier model architecture from GPT-2 through GPT-4o, DeepSeek-V3, and Kimi K2.
From microGPT, the guide traces the full training pipeline that produces frontier models: web-scale pre-training on trillions of tokens with the next-token prediction objective, Supervised Fine-Tuning on curated (instruction, response) pairs, RLHF with reward model training and PPO optimisation, and Anthropic's Constitutional AI — where a written constitution and AI feedback (RLAIF) generate alignment at a scale that human labelling alone cannot match. A detailed comparison of Claude Opus 4.6 and GPT-4o covers architecture, context window, safety approach, tool protocol, deployment options (Bedrock vs Azure), data residency, compliance certifications, and the practical differences in extended thinking and RAG integration.
The agent chapters deliver the complete technical architecture: the ReAct Reason→Act→Observe loop implemented through Claude's tool_use and tool_result API blocks; MCP server configuration with OAuth authentication and filtered tool access for enterprise environments; multi-agent orchestration patterns using an Orchestrator with specialist Sub-Agents for fraud detection, compliance, notifications, and audit logging; and a precise table mapping which agent pipeline stages use the LLM and which run as deterministic code. The banking framework applies all of this — six architectural principles, a complete Python MCP server with ownership verification and policy enforcement, an Orchestrator system prompt with mandatory rules and escalation conditions, and a full end-to-end EUR 500 fund transfer traced through 11 stages with the complete audit record structure satisfying GDPR, PSD2, MiFID II, and Basel III requirements.
The appendices extend the guide into the broader production landscape. A deep technical profile of Kimi K2 and K2.5 covers the confirmed 1.04-trillion-parameter MoE architecture, the MuonClip optimizer that achieved zero loss spikes training on 15.5 trillion tokens, the Agent Swarm paradigm enabling 100 parallel sub-agents and 1,500 tool calls per task, and benchmark comparison tables against Claude Opus 4.6 and GPT-4o on SWE-bench, AIME 2025, GPQA-Diamond, and Tau2-Bench. The Claude Agent SDK appendix covers context compaction, subagent context isolation, checkpoint persistence, and the initialiser-agent + incremental-progress-agent patterns from Anthropic's own engineering blog. RAG pipeline design, domain-specific embedding models for banking, and persistent cross-session memory round out the practical production guidance. A final appendix maps every prompt used during this document's own creation to the six internal steps Claude executes for every request — making the guide itself a working demonstration of the agent system it teaches.