AI Crucible now supports custom model integration, letting you connect self-hosted models, cloud AI services, and specialized APIs to your ensemble workflows. This guide shows you how to integrate your own AI infrastructure while maintaining the benefits of ensemble orchestration.
What you'll learn:
Time to read: 8-10 minutes
Custom models let you connect your own AI infrastructure to AI Crucible through OpenAI-compatible APIs. You can integrate self-hosted models (running on your servers), cloud AI services (like NVIDIA AI), or API aggregators (OpenRouter) and use them alongside built-in models in ensemble workflows.
AI Crucible includes 40+ pre-configured models from major providers (OpenAI, Anthropic, Google, DeepSeek). Custom models expand this by letting you:
Custom models work seamlessly with AI Crucible's ensemble strategies. Mix a self-hosted model with Claude and GPT-5 in Competitive Refinement, or use specialized models as domain experts in Expert Panel discussions.
Custom models solve specific problems that built-in models can't address. Run models on your infrastructure for data privacy compliance. Access specialized models fine-tuned for your industry. Reduce costs with cost-effective cloud services, test proprietary models, and maintain full control over your AI stack.
Common use cases:
Privacy and Compliance - Keep sensitive data within your infrastructure. Healthcare organizations can run HIPAA-compliant models on private servers. Financial institutions can maintain data sovereignty while still using ensemble AI.
Cost Optimization - Use cost-effective cloud AI services or self-hosted deployments. Services like NVIDIA AI offer competitive pricing. Use custom models for high-volume tasks (like drafts in Competitive Refinement Round 1) and premium models for final analysis.
Specialized Domains - Connect fine-tuned models optimized for your industry. Legal firms can integrate models trained on case law. Medical researchers can use models fine-tuned on medical literature. Integrate domain-specific models alongside general-purpose ones for comprehensive analysis.
Experimental Models - Test new models before they're available through major APIs. Try open-source models gaining traction in research. Evaluate proprietary models your team developed.
Infrastructure Control - Maintain full control over model selection, versioning, and deployment. Switch model versions without waiting for provider updates. Deploy models in your preferred cloud regions.
Navigate to Settings → Custom Models and click "Add Custom Model." Configure the display name, model name (API identifier), base URL (your API endpoint), optional API key, and customize appearance with model color. The model appears alongside built-in models in all ensemble workflows.
Navigate to Custom Models:
Settings (⌘/Ctrl+,) → Custom Models (/user/custom-models)
Click "Add Custom Model"
Configure Required Fields:
Display Name - How the model appears in AI Crucible
NVIDIA Nemotron Nano 30BModel Name (API Identifier) - The exact identifier your API uses
nvidia/nemotron-3-nano-30b-a3b (NVIDIA AI)meta-llama/llama-3.1-70b-instruct (OpenRouter)model field of API requestsBase URL - Your API endpoint
/v1/chat/completions)https://integrate.api.nvidia.com/v1 (NVIDIA AI)https://openrouter.ai/api/v1 (OpenRouter)https://your-server.com/v1 (Self-hosted)API Key (Optional) - Authentication token
Model Color - Visual identifier in the UI
Description (Optional) - Notes about the model
NVIDIA Nemotron Nano - fast and cost-effectiveAdvanced Settings:

Context Window - Maximum input tokens
Temperature - Creativity setting (0-2)
Top P - Nucleus sampling (0-1)
Top K - Limits vocabulary to K most likely tokens
Click "Save Model" - Your custom model is now available!
An OpenAI-compatible API follows OpenAI's REST API specification, specifically the /v1/chat/completions endpoint format. This standard is widely adopted by cloud AI services (NVIDIA AI), API aggregators (OpenRouter), and self-hosted deployments, making them plug-and-play compatible with AI Crucible.
The OpenAI API specification defines how to structure requests and responses when talking to AI models. It's become the de facto standard for model serving, similar to how REST became the standard for web APIs.
What makes an API OpenAI-compatible?
Endpoint Structure - Uses /v1/chat/completions for chat
/v1/completions for text completion (legacy)Request Format - Accepts messages in this structure:
{
"model": "llama3",
"messages": [{ "role": "user", "content": "Your prompt here" }],
"temperature": 0.7,
"max_tokens": 1000
}
Response Format - Returns structured responses:
{
"id": "chatcmpl-123",
"choices": [
{
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 100,
"total_tokens": 120
}
}
Streaming Support - Supports Server-Sent Events (SSE) for real-time response streaming
Popular OpenAI-Compatible Platforms:
AI Crucible works with any platform that follows this specification.
Sign up at build.nvidia.com, get an API key, and configure AI Crucible with base URL https://integrate.api.nvidia.com/v1, model name nvidia/nemotron-3-nano-30b-a3b, and your NVIDIA API key. NVIDIA AI provides cloud-hosted models with competitive pricing and fast inference.
Create Account:
Visit build.nvidia.com
Sign up for free account
Get $10 in free credits to start
Get API Key:
Dashboard → API Keys → Generate Key
Copy your key (starts with nvapi-...)
Browse Available Models:
Check NVIDIA AI Catalog for available models
Popular options:
nvidia/nemotron-3-nano-30b-a3b - Fast, efficient, cost-effectivenvidia/llama-3.1-nemotron-70b-instruct - Powerful reasoningmistralai/mistral-7b-instruct-v0.3 - Balanced performanceTest Your Endpoint:
# Verify API access
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer nvapi-YOUR-KEY-HERE" \
-d '{
"model": "nvidia/nemotron-3-nano-30b-a3b",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 50
}'
Add to AI Crucible:

Settings → Custom Models → Add Custom Model
NVIDIA Nemotron Nano 30Bnvidia/nemotron-3-nano-30b-a3bhttps://integrate.api.nvidia.com/v1nvapi-your-key-hereNVIDIA Nemotron - fast and cost-effectiveAdvanced Settings (optional):
40960.7140Use in Ensemble Workflows:
Your NVIDIA model appears alongside built-in models in model selection dropdowns. Mix cost-effective NVIDIA models with premium models for balanced ensemble strategies.
Sign up at openrouter.ai, get an API key, and configure AI Crucible with base URL https://openrouter.ai/api/v1, your chosen model name (like meta-llama/llama-3.1-70b-instruct), and your OpenRouter API key. OpenRouter gives you access to 100+ models through a single API.
Why Use OpenRouter?
Setup Steps:
Create Account:
Visit openrouter.ai
Sign up (free account with $1 credit)
Get API Key:
Dashboard → API Keys → Create Key
Copy your key (starts with sk-or-v1-...)
Browse Models:
Check the Models page for available models
Note the exact model identifier
Popular options:
meta-llama/llama-3.1-70b-instructanthropic/claude-3.5-sonnetgoogle/gemini-2-pro-expmistralai/mistral-largeConfigure in AI Crucible:
LLaMA 3.1 70B (OpenRouter)https://openrouter.ai/api/v1meta-llama/llama-3.1-70b-instructsk-or-v1-your-key-here128000LLaMA 3.1 70B via OpenRouter - powerful open sourceAdd Multiple Models:
You can add multiple OpenRouter models using the same API key. This gives you access to diverse model families through one provider:
Cost Tracking:
OpenRouter usage doesn't appear in AI Crucible's cost metrics (since it's external). Check OpenRouter's dashboard for usage analytics and billing.
AI Crucible encrypts API keys at rest using AES-256-GCM encryption before storing them in our database. Keys are only decrypted server-side when making API calls and are never exposed to client code or logs. This enterprise-grade encryption ensures your credentials remain secure even if the database is compromised.
Security Best Practices:
Use Read-Only Keys When Possible - If your API provides read-only keys (for inference only), use those instead of admin keys.
Rotate Keys Regularly - Change API keys every 90 days for enhanced security.
Use Dedicated Keys - Create separate API keys for AI Crucible rather than sharing keys across tools.
Monitor Usage - Check your API provider's dashboard for unexpected activity.
Self-Hosted for Sensitive Work - For highly sensitive data, use self-hosted models on your own infrastructure with restricted network access.
Select custom models alongside built-in models in any strategy's model picker. Custom models appear with a server icon and your chosen header color. They participate fully in all rounds, receive the same prompts, and their responses are evaluated alongside other models in the arbiter's comparative analysis.

Cost-Optimized Competitive Refinement:
Use cost-effective cloud models for early rounds and premium models for final polish:
Round 1 (Draft Generation):
Round 2-3 (Refinement): Enable "Adaptive Iteration Count" to stop early when models converge.
Arbiter: GPT-5 Mini or Gemini 2.5 Flash (budget-friendly analysis)
Savings: 50-70% compared to all-premium models
Privacy-First Expert Panel:
Keep sensitive data on-premises while leveraging external expertise:
Internal Analysis (Self-Hosted Models):
External Perspective (API Models):
Assign expert personas to each model. Self-hosted models handle proprietary data while API models provide general expertise.
Hybrid Debate Tournament:
Test ideas using diverse model architectures:
Team A (Open Source via OpenRouter):
Team B (Proprietary APIs):
Different training approaches create more substantive debates.
Development Testing:
Validate your fine-tuned model against production models:
Your Custom Model:
Baseline Comparisons:
Run Collaborative Synthesis to see how your model's output compares in real ensemble scenarios.
Custom model costs depend on your deployment choice. Cloud AI services (NVIDIA AI) charge per-token with competitive pricing. OpenRouter charges per-token, often cheaper than direct APIs. Self-hosted models have infrastructure costs but no per-token fees.
Cloud AI Services (NVIDIA AI, etc.):
Costs:
Per-session cost: Varies by model ($0.01-0.10 typical)
Best for:
Example pricing (NVIDIA Nemotron):
OpenRouter:
Costs:
Example pricing:
Best for:
Self-Hosted (AWS, GCP, Azure):
Costs:
Per-session cost: $0 for API calls (after infrastructure costs)
Break-even point: ~50,000-100,000 requests/month
Best for:
Cost Optimization Strategies:
Mix Cost-Effective and Premium Models - Use budget cloud models for drafts (Round 1), premium models for refinement (Rounds 2-3).
Enable Adaptive Iteration Count - Stop early when models converge, saving rounds.
Strategic Model Selection - Use premium models only for final analysis or arbiter role.
Batch Processing - Run multiple prompts in sequence to amortize warm-up costs.
Common issues include incorrect base URL (verify the endpoint and ensure /v1 suffix), wrong model name (check API documentation), API key problems (test authentication separately), and network issues (verify firewall rules). Test your endpoint with curl before adding to AI Crucible.
Test Endpoint Directly:
Before adding to AI Crucible, verify your API works:
# Replace with your actual values
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR-API-KEY" \
-d '{
"model": "nvidia/nemotron-3-nano-30b-a3b",
"messages": [{"role": "user", "content": "Say hello"}],
"max_tokens": 50
}'
Expected response: JSON with choices[0].message.content
If this fails, fix your API setup before configuring AI Crucible.
Common Issues and Solutions:
Error: "Connection refused" or "Network error"
Cause: AI Crucible can't reach your endpoint
Solutions:
Error: "Model not found" or "Invalid model name"
Cause: Model identifier doesn't match API expectations
Solutions:
nvidia/nemotron-3-nano-30b-a3b)meta-llama/llama-3.1-70b-instruct)Error: "Authentication failed" or "Invalid API key"
Cause: API key is wrong or expired
Solutions:
nvapi-sk-or-v1-Error: "Model responds but output is garbled"
Cause: API returns non-standard format
Solutions:
Error: "Timeout" or "Request too slow"
Cause: Model is too slow for AI Crucible's timeouts
Solutions:
Still Having Issues?
Check Browser Console:
Verify Configuration:
Test with Built-In Models:
Yes. Custom models work seamlessly with built-in models in all ensemble strategies. You can select any combination—for example, NVIDIA Nemotron, GPT-5, and Claude Sonnet 4.5 together in Competitive Refinement. The arbiter treats all models equally, evaluating responses based on quality regardless of whether they're custom or built-in.
Leverage Complementary Strengths:
Speed + Quality:
Privacy + Expertise:
Cost + Performance:
Strategy-Specific Recommendations:
Competitive Refinement:
Expert Panel:
Debate Tournament:
Evaluation Considerations:
The arbiter (typically a built-in model) evaluates all responses equally:
Performance Tips:
Start Rounds Simultaneously - Mix fast and slow models so they run in parallel.
Monitor Latency - If custom models are much slower, they'll delay the ensemble.
Balance Load - Don't use only slow custom models; mix with faster built-in ones.
Custom models must support OpenAI-compatible API format, synchronous and streaming responses, and standard token counting. They don't integrate with AI Crucible's cost tracking (you manage billing externally), may have slower response times than optimized APIs, and require you to maintain uptime and handle errors. Context window limits depend on your model configuration.
API Compatibility:
Must support:
/v1/chat/completions endpointOptional but recommended:
Performance Considerations:
Response Time:
Throughput:
Context Window:
Feature Limitations:
No Cost Tracking:
Manual Configuration:
No Automatic Fallback:
Provider-Specific Features:
Operational Responsibilities:
Uptime Management:
Model Updates:
Security:
Despite limitations, custom models provide crucial flexibility for specialized use cases where built-in models can't meet requirements.
Custom models excel in specialized scenarios. Legal firms run case-law fine-tuned models for precedent analysis. Healthcare organizations use HIPAA-compliant on-premises models for patient data. Developers test proprietary models before deployment. Researchers access experimental models via OpenRouter.
Scenario 1: Healthcare Compliance
Challenge: Hospital needs ensemble AI for medical documentation but cannot send patient data to external APIs due to HIPAA regulations.
Solution:
Strategy: Expert Panel
Benefit: 100% HIPAA-compliant while leveraging ensemble intelligence
Scenario 2: Legal Research Automation
Challenge: Law firm has proprietary model fine-tuned on their case history and precedents.
Solution:
Strategy: Collaborative Synthesis
Benefit: Leverage proprietary knowledge while gaining diverse legal perspectives
Scenario 3: Cost-Optimized Content Creation
Challenge: Marketing agency needs to generate hundreds of social media posts daily, but API costs are prohibitive.
Solution:
Strategy: Competitive Refinement
Benefit: 90% cost reduction ($500/month → $50/month)
Scenario 4: Multilingual Customer Support
Challenge: E-commerce company needs ensemble AI for customer inquiries in languages not well-supported by major APIs.
Solution:
Strategy: Expert Panel
Benefit: Better quality responses in underserved languages
Scenario 5: ML Model Evaluation
Challenge: AI startup needs to evaluate their new model against production benchmarks.
Solution:
Strategy: Competitive Refinement
Benefit: Rigorous testing in real ensemble conditions before production deployment
Navigate to Settings → Custom Models to view all configured models in a card-based grid. Each card shows usage count and last used date. Edit models by clicking the edit icon (updates configuration without re-entering API key unless you want to change it), or delete unused models with the trash icon.
Organization Strategies:
Naming Convention: Use consistent, descriptive names that indicate:
Nemotron, LLaMA, Your-Company-LLMNano, 70B, fine-tuned-v2NVIDIA, OpenRouter, AWSExamples:
Nemotron Nano 30B NVIDIALLaMA 3.1 70B OpenRouterCompany Legal LLM v2 AWSColor Coding: Assign colors by category:
Regular Maintenance:
Review Usage:
Update Configurations:
Rotate API Keys:
Monitor Performance:
Version Management:
When Updating Models:
Example:
Company LLM v1 (current production)Company LLM v2 (testing)Related articles: