Bring Your Own Key: Any OpenRouter Model in an Ensemble

AI Crucible hosts about twenty models natively. OpenRouter hosts hundreds. The new Connect tier bridges the two: bring your own OpenRouter key, add any model by its ID, and run it inside the same ensemble strategies as our built-in models. Your model usage is billed by OpenRouter, not by us.

To prove the point, we ran an ensemble made entirely of OpenRouter models that AI Crucible does not host natively. The metered cost on our side was $0 for every model response.

Time to read: 8-10 minutes.

Session cost: Approx. $0.003 to run (the Gemini arbiter only); both OpenRouter models cost $0.00. Adding two judge models for the evaluation brought the metered total to about $0.02.

Parameter Value
Strategy Competitive Refinement
Rounds 2
Web Search Disabled
Arbiter Gemini 3.1 Flash-Lite
Models Cohere Command R+ and NVIDIA Nemotron 49B (both OpenRouter, your key)

What is the Connect tier?

Connect is a $5 per month plan built around your own OpenRouter key. It is the cheapest tier we offer, and it changes the economics of multi-model runs.

Plan Price Model usage Token grant
Connect $5/mo Unlimited on your key, billed by OpenRouter 300k tokens for platform calls
Starter $19/mo 20+ built-in models, metered 2M unified tokens
Pro $49/mo 20+ built-in models, metered 10M unified tokens

On Connect, the 300k monthly token grant covers web search and any built-in platform models you mix in. Everything you run on an OpenRouter model is unmetered: the cost column in your results reads $0, because that spend lives in your OpenRouter account.

You still get all seven ensemble strategies, web search, and a standard rate limit of 10 runs per minute.


How do I connect an OpenRouter key?

The setup takes a minute and lives under Settings.

  1. Create an API key at openrouter.ai — it looks like sk-or-v1-....
  2. In AI Crucible, open Settings → Provider Keys and paste the key. It is encrypted before it is stored, and the client only ever sees the last four characters.
  3. Use the Test button to confirm the key works against OpenRouter.

Provider Keys settings with a connected OpenRouter key


How do I add a model by ID?

Once a key is saved, you add models by their OpenRouter slug.

  1. Enter the model in owner/model form, for example cohere/command-r-plus-08-2024 or nvidia/llama-3.3-nemotron-super-49b-v1.5.
  2. We enrich it from OpenRouter's catalog — friendly name, plus whether it supports vision and tools.
  3. The model now appears in the selector with a distinct color, labeled as your OpenRouter model.

Use the exact slug from openrouter.ai/models. Ids often carry a version suffix, so a guessed name like nvidia/llama-3.3-nemotron-super-49b is rejected as invalid. Models without tool support still run; they just skip web grounding and other tools.

You can add up to 100 models. Image attachments work on models that support vision; PDF and JSON attachments work on all of them, since PDFs route through OpenRouter's file parser.

Adding an OpenRouter model by ID


The scenario: a design problem, two models, both on your key

To show the tier in action, we ran a Competitive Refinement session: each model answers, sees the other, then refines. We picked two OpenRouter-exclusive models that AI Crucible does not host natively, so the whole ensemble ran unmetered on the OpenRouter key.

The prompt: Design the data model and API surface for a multi-tenant feature-flag service. Cover tenancy isolation, flag-evaluation latency, and a safe rollout and rollback story. Be specific enough to hand to an engineer.

This prompt rewards depth and concrete trade-offs, not retrieval, so it is a fair test of whether OpenRouter models can carry a real design task.

View the full chat here

The contenders

Model Role The Pitch
Cohere Command R+ (08-2024) The Structured Clean schema, methodical tables, $0 on the bill.
NVIDIA Nemotron Super 49B V1.5 The Architect UUIDs, default flags, sharper trade-offs, also $0.

Both are OpenRouter models running on your key, so neither consumed any AI Crucible tokens.


Round 1: First drafts

Both models produced a full data model and API surface on the first pass, but with different instincts.

Cohere Command R+

Cohere led with a clean relational schema and tenant quotas baked in from the start. It defined quota_limits as (e.g., max flags, requests per minute) and laid out Tenants, Flags, Flag Versions, and Evaluations as separate tables. The draft was thorough but slow — it took 181 seconds, the longest single response in the run.

NVIDIA Nemotron 49B

Nemotron was faster (86 seconds) and reached for trade-offs early. On latency it argued:

Edge Evaluation: Offloading to CDNs reduces latency but limits rule complexity.

It also seeded the schema with default_flags (predefined system flags, e.g., for observability) — an operational touch the prompt never asked for.

At the end of Round 1 the two designs agreed 61%.

Round 2: Refinement

In Competitive Refinement, round 2 is where each model reads the other and revises. Both models converged on a shared backbone: immutable FlagVersions with a previous_version_id to make rollback a pointer change rather than a destructive edit.

Nemotron switched its primary keys to UUIDs and kept its observability defaults. Cohere tightened its API surface and adopted the versioned-rollback model. Agreement rose to 74% — visible convergence, not collapse, since each kept its distinctive ideas.

Pair Round 1 Round 2 Trend
Cohere ↔ Nemotron 0.61 0.74 Converging

The Council of AI Judges

We then ran an evaluation with two low-cost judges — Gemini 3.5 Flash and Gemini 3.1 Flash-Lite — scoring anonymized transcripts with no model names attached. The arbiter's synthesized answer was scored too.

Evaluation scores: Nemotron 9.3, Cohere 8.7, synthesizer 7.8

Model Overall Accuracy Clarity Completeness Creativity Usefulness
NVIDIA Nemotron 49B 🏆 9.3 9.5 9.5 9.5 8.0 9.5
Cohere Command R+ 8.7 9.2 9.0 8.5 8.5 8.0
Gemini 3.1 Flash-Lite (synth) 7.8 9.0 8.0 6.0 6.0 6.0

The headline: a model AI Crucible does not host natively won the panel outright at 9.3, and even out-scored the synthesizer that combined both answers.

Per-judge agreement. The two judges barely differed on the OpenRouter models. Gemini 3.5 Flash gave Nemotron 9.2 and Cohere 8.64; Gemini 3.1 Flash-Lite gave Nemotron 9.5 and Cohere 8.8. The model-level divergence was just 1.4% for Nemotron and 1.0% for Cohere — the judges clearly agreed on the ranking.

Where they split. The synthesized answer drew the widest spread (8.0% divergence). Both judges marked it down on completeness and creativity: the merge captured the structure of both designs but lost some of the specificity that made the originals strong.


Performance and cost

This is the headline of the whole tier. Watch the cost column.

Metric Cohere Command R+ NVIDIA Nemotron 49B
Total cost (us) $0.00 $0.00
R1 Time 181s 86s
R2 Time 60s 58s
Tokens used ~4,982 ~8,660
Response length ~5,386 chars (R2) ~5,983 chars (R2)

Both models cost $0 in AI Crucible. Their real spend appears in your OpenRouter dashboard, where you control the budget. The only metered charge in the whole run was the Gemini arbiter, at about $0.003.


The Verdict

🏆 NVIDIA Nemotron Super 49B takes it. It was faster than Cohere in both rounds, raised the sharpest trade-offs, and the judges ranked it first on every criterion except creativity, where the two tied.

Cohere Command R+ is the steadier structuralist — clean tables and explicit quotas — but its 181-second first round and slightly thinner trade-offs left it second. Still an 8.7, and entirely free on our side.

Strategic takeaway: Connect turns model choice into a budgeting decision rather than a platform limit. If a model exists on OpenRouter, you can put it in a Crucible ensemble for the flat $5, pay its real cost on your own key, and — as this run shows — it can beat the field.


Final Response

# Multi-Tenant Feature-Flag Service Design

## Data Model
- **Tenants**: tenant_id (PK), name, api_key, quota_limits.
- **Flags**: flag_id (PK), tenant_id (FK), key (unique), name, current_version_id (FK).
- **FlagVersions**: version_id (PK), flag_id (FK), previous_version_id (for rollback), rollout_strategy (JSON), status (active/draft/rolled_back).
- **Variations**: variation_id (PK), flag_id (FK), name, value (JSON).
- **TargetingRules**: rule_id (PK), version_id (FK), condition (JSON), variation_id (FK).
- **AuditLogs**: log_id (PK), tenant_id, action, timestamp, details (JSON).

## API Surface
- **Evaluation**: POST /tenants/{tenant_id}/flags/{flag_id}/evaluate
- **Rollout/Rollback**: POST .../rollout (start/modify), POST .../rollback (revert to a specific version).
- **Audit**: GET /tenants/{tenant_id}/flags/{flag_id}/audit.

## Tenancy Isolation
- **Database Sharding**: separate databases per tenant for strict isolation.
- **Row-Level Security (alternative)**: a centralized database with per-row tenant policies.

Try it yourself

  1. Subscribe to Connect on the pricing page.
  2. Add your OpenRouter key under Settings → Provider Keys.
  3. Add two models by ID, choose Competitive Refinement, and set 2 rounds.
  4. Paste the prompt below and run.
Design the data model and API surface for a multi-tenant feature-flag
service. Cover tenancy isolation, flag-evaluation latency, and a safe
rollout and rollback story. Be specific enough to hand to an engineer.

Explore the run: Read the full 2-round session and the raw model outputs in the Shared Chat Session.

Further Reading