This is a personal note from the founder. If you're reading this, you're witnessing a moment we've been building toward for months: AI Crucible is officially open to everyone. No invitation code. No waiting list. Just sign up and start.
Time to read: 10-15 minutes
When we launched AI Crucible in November 2025, we made a deliberate choice: keep the doors closed. Every new user needed an invitation code. This wasn't about exclusivity — it was about responsibility. We were building something different from any AI chat platform that existed, and we needed real users to pressure-test the idea before opening it up.
The idea was simple but ambitious: what if you never had to trust a single AI model with your important decisions? What if, instead, you could orchestrate multiple models — from different providers, with different training data, strengths, and blind spots — and let them compete, collaborate, debate, and verify each other's work?
That idea needed validation from real people with real problems.
Over four months of closed beta — from November 2025 to March 2026 — AI Crucible grew from a concept into a comprehensive platform. Here's what we shipped.
We developed and refined seven distinct strategies, each designed for a specific type of problem:
Each strategy is documented in depth with walkthroughs, cost breakdowns, and real-world examples.
We integrated models from OpenAI (GPT-5.2, GPT-5.1, GPT-5 Mini), Anthropic (Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5), Google (Gemini 3.1 Pro, Gemini 3 Flash), xAI (Grok 4), Mistral (Mistral Large 3, Ministral 3B), DeepSeek (DeepSeek Chat, DeepSeek Reasoner), Moonshot (Kimi K2.5), Alibaba (Qwen 3.5 Plus, Qwen Flash), and Meta (Llama 3.3).
This isn't just checkbox integration. Each provider has its own API quirks, streaming protocols, token counting logic, and error handling patterns. We built a unified abstraction layer that makes them all work seamlessly within ensemble strategies — including handling differences in tool calling, reasoning tokens, and cache behaviors.
One of the hardest problems in multi-model AI is answering a deceptively simple question: which response was actually better?
We built an evaluations system that uses LLM-as-a-Judge panels to score responses across dimensions like accuracy, depth, reasoning quality, and practical usefulness. These evaluations feed into the Benchmarks Dashboard, which tracks model performance across categories over time — powered by real user sessions, not synthetic benchmarks.
Beyond strategies and models, we shipped features that make the platform usable for daily work:
We didn't just build the platform — we documented everything. From getting started to deep technical dives on parallel tool calling challenges, from geopolitical simulations to model benchmark analyses, we published 52 articles that serve as both documentation and proof-of-concept for ensemble AI.
Building in closed beta means relying on a small group of people who are willing to use something unfinished, report bugs they encounter, and give honest feedback about what works and what doesn't.
Three people in particular shaped what AI Crucible became:
Maya Siderova — Her feedback on how ensemble strategies performed in real creative workflows pushed us to refine the competitive dynamics in Competitive Refinement and the synthesis quality in Collaborative Synthesis. The platform is more practical because of her perspective.
Mehul Harry — His testing across model combinations and edge cases helped us identify provider-specific quirks that we would have missed in internal testing. Several of our reliability improvements came directly from issues he surfaced.
Cristian Ormazabal Ortega — His feedback on the evaluation framework and benchmarking approach helped us calibrate how we measure model performance. The Benchmarks Dashboard is more trustworthy because of his input.
To all three — and to every early user who registered with an invitation code, ran sessions, voted on responses, and sent us feedback — thank you. You helped us build something worth opening to the world.
When we started building AI Crucible, the idea that you'd need multiple AI models working together felt contrarian. Most people used ChatGPT or Claude and assumed one model was enough. Over the past six months, that assumption has been dismantled — not by us, but by some of the most influential people in AI.
We covered this shift extensively in our December 2025 article, The Ensemble AI Revolution: Karpathy, Nadella, and AI Crucible. Since then, the trend has only accelerated.
In late 2025, Andrej Karpathy — co-founder of OpenAI and former AI lead at Tesla — published his LLM Council framework. The core insight: for high-stakes decisions, you should never rely on a single model. His approach uses a three-stage process: independent response generation, anonymous peer review, and chairman synthesis. It's elegant, but limited to a single linear workflow.
Nadella showcased a "deep research" app implementing three decision frameworks — an AI Council with iterative deliberation, a DXO Framework with specialized model roles, and an Ensemble Framework with MCP-based synthesis. His message was clear: Microsoft sees council-based AI as the future of production systems for enterprise decision-making.
In February 2026, Perplexity launched their Model Council feature — running queries across Claude Opus, GPT, and Gemini simultaneously, with a synthesizer model combining the results. Available to Max subscribers, it surfaces agreements and disagreements between models to encourage critical thinking. They followed this with Perplexity Computer, which can orchestrate up to 19 models for complex enterprise tasks.
Nadella's own prediction for 2026: "This is the pivotal year for transitioning from standalone models to comprehensive AI systems."
These developments validate the core thesis we've been building on since day one. But there's a meaningful difference in approach:
The consensus is now undeniable: one model is not enough for the decisions that matter. The question is no longer whether to use multi-model systems, but how — and that's the question AI Crucible was built to answer.
AI Crucible offers two subscription tiers, both providing access to the full platform and all 20+ models:
| Plan | Price | Tokens/Month | Highlights |
|---|---|---|---|
| Starter | $19/mo | 2 million | All models, all strategies, 30-day attachment retention |
| Pro | $49/mo | 10 million | Unlimited runs, MCP integration, custom models, API access, priority support |
Both plans include access to every ensemble strategy, every model, the evaluations framework, prompt classification, and web search grounding. Token top-up packs are available if you need more capacity within a billing cycle.
You can view the full pricing breakdown on the Plans page.
If you've been using AI Crucible during the closed beta — thank you, again. Your early adoption made this launch possible.
Now we need your help with something different: spread the word.
The best products grow through people who genuinely find them useful telling other people. If AI Crucible has helped you make better decisions, produce better content, or think more critically about AI outputs — let others know.
We're not slowing down. The closed beta was the foundation. The public launch is the starting line. We'll continue publishing model benchmarks, refining strategies based on real usage data, and building features that make ensemble AI more accessible.
The future of AI isn't about finding the one perfect model. It's about orchestrating the right models, with the right strategy, for the right problem. That's what AI Crucible does — and now it's open to everyone.