Custom Models Integration: Bring Your Own AI Infrastructure

AI Crucible now supports custom model integration, letting you connect self-hosted models, cloud AI services, and specialized APIs to your ensemble workflows. This guide shows you how to integrate your own AI infrastructure while maintaining the benefits of ensemble orchestration.

What you'll learn:

Time to read: 8-10 minutes


What are custom models in AI Crucible?

Custom models let you connect your own AI infrastructure to AI Crucible through OpenAI-compatible APIs. You can integrate self-hosted models (running on your servers), cloud AI services (like NVIDIA AI), or API aggregators (OpenRouter) and use them alongside built-in models in ensemble workflows.

AI Crucible includes 40+ pre-configured models from major providers (OpenAI, Anthropic, Google, DeepSeek). Custom models expand this by letting you:

Custom models work seamlessly with AI Crucible's ensemble strategies. Mix a self-hosted model with Claude and GPT-5 in Competitive Refinement, or use specialized models as domain experts in Expert Panel discussions.


Why would I use custom models?

Custom models solve specific problems that built-in models can't address. Run models on your infrastructure for data privacy compliance. Access specialized models fine-tuned for your industry. Reduce costs with cost-effective cloud services, test proprietary models, and maintain full control over your AI stack.

Common use cases:

Privacy and Compliance - Keep sensitive data within your infrastructure. Healthcare organizations can run HIPAA-compliant models on private servers. Financial institutions can maintain data sovereignty while still using ensemble AI.

Cost Optimization - Use cost-effective cloud AI services or self-hosted deployments. Services like NVIDIA AI offer competitive pricing. Use custom models for high-volume tasks (like drafts in Competitive Refinement Round 1) and premium models for final analysis.

Specialized Domains - Connect fine-tuned models optimized for your industry. Legal firms can integrate models trained on case law. Medical researchers can use models fine-tuned on medical literature. Integrate domain-specific models alongside general-purpose ones for comprehensive analysis.

Experimental Models - Test new models before they're available through major APIs. Try open-source models gaining traction in research. Evaluate proprietary models your team developed.

Infrastructure Control - Maintain full control over model selection, versioning, and deployment. Switch model versions without waiting for provider updates. Deploy models in your preferred cloud regions.


How do I add a custom model?

Navigate to Settings → Custom Models and click "Add Custom Model." Configure the display name, model name (API identifier), base URL (your API endpoint), optional API key, and customize appearance with model color. The model appears alongside built-in models in all ensemble workflows.

Step-by-Step Setup

Navigate to Custom Models:

Settings (⌘/Ctrl+,) → Custom Models (/user/custom-models)

Click "Add Custom Model"

Add Model Configure Required Fields:

Display Name - How the model appears in AI Crucible

Model Name (API Identifier) - The exact identifier your API uses

Base URL - Your API endpoint

API Key (Optional) - Authentication token

Model Color - Visual identifier in the UI

Description (Optional) - Notes about the model

Advanced Settings:

Advanced Settings

Context Window - Maximum input tokens

Temperature - Creativity setting (0-2)

Top P - Nucleus sampling (0-1)

Top K - Limits vocabulary to K most likely tokens

Click "Save Model" - Your custom model is now available!


What does OpenAI-compatible API mean?

An OpenAI-compatible API follows OpenAI's REST API specification, specifically the /v1/chat/completions endpoint format. This standard is widely adopted by cloud AI services (NVIDIA AI), API aggregators (OpenRouter), and self-hosted deployments, making them plug-and-play compatible with AI Crucible.

The OpenAI API specification defines how to structure requests and responses when talking to AI models. It's become the de facto standard for model serving, similar to how REST became the standard for web APIs.

What makes an API OpenAI-compatible?

Endpoint Structure - Uses /v1/chat/completions for chat

Request Format - Accepts messages in this structure:

{
  "model": "llama3",
  "messages": [{ "role": "user", "content": "Your prompt here" }],
  "temperature": 0.7,
  "max_tokens": 1000
}

Response Format - Returns structured responses:

{
  "id": "chatcmpl-123",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Response text here"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 100,
    "total_tokens": 120
  }
}

Streaming Support - Supports Server-Sent Events (SSE) for real-time response streaming

Popular OpenAI-Compatible Platforms:

AI Crucible works with any platform that follows this specification.


How do I set up NVIDIA AI integration?

Sign up at build.nvidia.com, get an API key, and configure AI Crucible with base URL https://integrate.api.nvidia.com/v1, model name nvidia/nemotron-3-nano-30b-a3b, and your NVIDIA API key. NVIDIA AI provides cloud-hosted models with competitive pricing and fast inference.

Complete NVIDIA AI Setup

Create Account:

Visit build.nvidia.com

Sign up for free account

Get $10 in free credits to start

Get API Key:

NVIDIA API Dashboard → API Keys → Generate Key

Copy your key (starts with nvapi-...)

Browse Available Models:

Check NVIDIA AI Catalog for available models

Popular options:

Test Your Endpoint:

# Verify API access
curl https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer nvapi-YOUR-KEY-HERE" \
  -d '{
    "model": "nvidia/nemotron-3-nano-30b-a3b",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 50
  }'

Add to AI Crucible:

NVIDIA Model

Settings → Custom Models → Add Custom Model

Advanced Settings (optional):

Use in Ensemble Workflows:

Your NVIDIA model appears alongside built-in models in model selection dropdowns. Mix cost-effective NVIDIA models with premium models for balanced ensemble strategies.


How do I connect OpenRouter models?

Sign up at openrouter.ai, get an API key, and configure AI Crucible with base URL https://openrouter.ai/api/v1, your chosen model name (like meta-llama/llama-3.1-70b-instruct), and your OpenRouter API key. OpenRouter gives you access to 100+ models through a single API.

OpenRouter Integration Guide

Why Use OpenRouter?

Setup Steps:

Create Account:

Visit openrouter.ai

Sign up (free account with $1 credit)

Get API Key:

Dashboard → API Keys → Create Key

Copy your key (starts with sk-or-v1-...)

Browse Models:

Check the Models page for available models

Note the exact model identifier

Popular options:

Configure in AI Crucible:

Add Multiple Models:

You can add multiple OpenRouter models using the same API key. This gives you access to diverse model families through one provider:

Cost Tracking:

OpenRouter usage doesn't appear in AI Crucible's cost metrics (since it's external). Check OpenRouter's dashboard for usage analytics and billing.


How secure are my API keys?

AI Crucible encrypts API keys at rest using AES-256-GCM encryption before storing them in our database. Keys are only decrypted server-side when making API calls and are never exposed to client code or logs. This enterprise-grade encryption ensures your credentials remain secure even if the database is compromised.

Security Best Practices:

Use Read-Only Keys When Possible - If your API provides read-only keys (for inference only), use those instead of admin keys.

Rotate Keys Regularly - Change API keys every 90 days for enhanced security.

Use Dedicated Keys - Create separate API keys for AI Crucible rather than sharing keys across tools.

Monitor Usage - Check your API provider's dashboard for unexpected activity.

Self-Hosted for Sensitive Work - For highly sensitive data, use self-hosted models on your own infrastructure with restricted network access.


How do I use custom models in ensemble strategies?

Select custom models alongside built-in models in any strategy's model picker. Custom models appear with a server icon and your chosen header color. They participate fully in all rounds, receive the same prompts, and their responses are evaluated alongside other models in the arbiter's comparative analysis.

Custom Models

Practical Ensemble Patterns

Cost-Optimized Competitive Refinement:

Use cost-effective cloud models for early rounds and premium models for final polish:

Round 1 (Draft Generation):

Round 2-3 (Refinement): Enable "Adaptive Iteration Count" to stop early when models converge.

Arbiter: GPT-5 Mini or Gemini 2.5 Flash (budget-friendly analysis)

Savings: 50-70% compared to all-premium models

Privacy-First Expert Panel:

Keep sensitive data on-premises while leveraging external expertise:

Internal Analysis (Self-Hosted Models):

External Perspective (API Models):

Assign expert personas to each model. Self-hosted models handle proprietary data while API models provide general expertise.

Hybrid Debate Tournament:

Test ideas using diverse model architectures:

Team A (Open Source via OpenRouter):

Team B (Proprietary APIs):

Different training approaches create more substantive debates.

Development Testing:

Validate your fine-tuned model against production models:

Your Custom Model:

Baseline Comparisons:

Run Collaborative Synthesis to see how your model's output compares in real ensemble scenarios.


How much do custom models cost?

Custom model costs depend on your deployment choice. Cloud AI services (NVIDIA AI) charge per-token with competitive pricing. OpenRouter charges per-token, often cheaper than direct APIs. Self-hosted models have infrastructure costs but no per-token fees.

Cost Analysis by Deployment Type

Cloud AI Services (NVIDIA AI, etc.):

Costs:

Per-session cost: Varies by model ($0.01-0.10 typical)

Best for:

Example pricing (NVIDIA Nemotron):

OpenRouter:

Costs:

Example pricing:

Best for:

Self-Hosted (AWS, GCP, Azure):

Costs:

Per-session cost: $0 for API calls (after infrastructure costs)

Break-even point: ~50,000-100,000 requests/month

Best for:

Cost Optimization Strategies:

Mix Cost-Effective and Premium Models - Use budget cloud models for drafts (Round 1), premium models for refinement (Rounds 2-3).

Enable Adaptive Iteration Count - Stop early when models converge, saving rounds.

Strategic Model Selection - Use premium models only for final analysis or arbiter role.

Batch Processing - Run multiple prompts in sequence to amortize warm-up costs.


What if my custom model doesn't work?

Common issues include incorrect base URL (verify the endpoint and ensure /v1 suffix), wrong model name (check API documentation), API key problems (test authentication separately), and network issues (verify firewall rules). Test your endpoint with curl before adding to AI Crucible.

Troubleshooting Guide

Test Endpoint Directly:

Before adding to AI Crucible, verify your API works:

# Replace with your actual values
curl https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR-API-KEY" \
  -d '{
    "model": "nvidia/nemotron-3-nano-30b-a3b",
    "messages": [{"role": "user", "content": "Say hello"}],
    "max_tokens": 50
  }'

Expected response: JSON with choices[0].message.content

If this fails, fix your API setup before configuring AI Crucible.

Common Issues and Solutions:

Error: "Connection refused" or "Network error"

Cause: AI Crucible can't reach your endpoint

Solutions:

Error: "Model not found" or "Invalid model name"

Cause: Model identifier doesn't match API expectations

Solutions:

Error: "Authentication failed" or "Invalid API key"

Cause: API key is wrong or expired

Solutions:

Error: "Model responds but output is garbled"

Cause: API returns non-standard format

Solutions:

Error: "Timeout" or "Request too slow"

Cause: Model is too slow for AI Crucible's timeouts

Solutions:

Still Having Issues?

Check Browser Console:

Verify Configuration:

Test with Built-In Models:


Can I mix custom models with built-in models?

Yes. Custom models work seamlessly with built-in models in all ensemble strategies. You can select any combination—for example, NVIDIA Nemotron, GPT-5, and Claude Sonnet 4.5 together in Competitive Refinement. The arbiter treats all models equally, evaluating responses based on quality regardless of whether they're custom or built-in.

Best Practices for Mixing Models

Leverage Complementary Strengths:

Speed + Quality:

Privacy + Expertise:

Cost + Performance:

Strategy-Specific Recommendations:

Competitive Refinement:

Expert Panel:

Debate Tournament:

Evaluation Considerations:

The arbiter (typically a built-in model) evaluates all responses equally:

Performance Tips:

Start Rounds Simultaneously - Mix fast and slow models so they run in parallel.

Monitor Latency - If custom models are much slower, they'll delay the ensemble.

Balance Load - Don't use only slow custom models; mix with faster built-in ones.


What limitations do custom models have?

Custom models must support OpenAI-compatible API format, synchronous and streaming responses, and standard token counting. They don't integrate with AI Crucible's cost tracking (you manage billing externally), may have slower response times than optimized APIs, and require you to maintain uptime and handle errors. Context window limits depend on your model configuration.

Technical Requirements

API Compatibility:

Must support:

Optional but recommended:

Performance Considerations:

Response Time:

Throughput:

Context Window:

Feature Limitations:

No Cost Tracking:

Manual Configuration:

No Automatic Fallback:

Provider-Specific Features:

Operational Responsibilities:

Uptime Management:

Model Updates:

Security:

Despite limitations, custom models provide crucial flexibility for specialized use cases where built-in models can't meet requirements.


What are some example use cases?

Custom models excel in specialized scenarios. Legal firms run case-law fine-tuned models for precedent analysis. Healthcare organizations use HIPAA-compliant on-premises models for patient data. Developers test proprietary models before deployment. Researchers access experimental models via OpenRouter.

Real-World Integration Scenarios

Scenario 1: Healthcare Compliance

Challenge: Hospital needs ensemble AI for medical documentation but cannot send patient data to external APIs due to HIPAA regulations.

Solution:

Strategy: Expert Panel

Benefit: 100% HIPAA-compliant while leveraging ensemble intelligence

Scenario 2: Legal Research Automation

Challenge: Law firm has proprietary model fine-tuned on their case history and precedents.

Solution:

Strategy: Collaborative Synthesis

Benefit: Leverage proprietary knowledge while gaining diverse legal perspectives

Scenario 3: Cost-Optimized Content Creation

Challenge: Marketing agency needs to generate hundreds of social media posts daily, but API costs are prohibitive.

Solution:

Strategy: Competitive Refinement

Benefit: 90% cost reduction ($500/month → $50/month)

Scenario 4: Multilingual Customer Support

Challenge: E-commerce company needs ensemble AI for customer inquiries in languages not well-supported by major APIs.

Solution:

Strategy: Expert Panel

Benefit: Better quality responses in underserved languages

Scenario 5: ML Model Evaluation

Challenge: AI startup needs to evaluate their new model against production benchmarks.

Solution:

Strategy: Competitive Refinement

Benefit: Rigorous testing in real ensemble conditions before production deployment


How do I manage multiple custom models?

Navigate to Settings → Custom Models to view all configured models in a card-based grid. Each card shows usage count and last used date. Edit models by clicking the edit icon (updates configuration without re-entering API key unless you want to change it), or delete unused models with the trash icon.

Custom Model Management Best Practices

Organization Strategies:

Naming Convention: Use consistent, descriptive names that indicate:

Examples:

Color Coding: Assign colors by category:

Regular Maintenance:

Review Usage:

Update Configurations:

Rotate API Keys:

Monitor Performance:

Version Management:

When Updating Models:

Example:


Related articles: