AI Crucible provides a unified, OpenAI-compatible API that allows you to integrate powerful ensemble AI strategies directly into your applications. This guide covers everything from authentication to advanced orchestration controls.
Want to see the API in action? Check out our sample chat application on GitHub. This complete, open-source example demonstrates:
The sample app includes complete setup instructions for beginners and serves as a production-ready starting point for your own applications.
The API is built to be a drop-in replacement for standard LLM calls. If you use the OpenAI SDK, you can switch to AI Crucible by simply changing the baseURL and apiKey.
All API requests require a valid API Key passed in the Authorization header.
Authorization: Bearer sk-aicrucible-...
You can manage your API keys in the Settings > API Keys section of the dashboard. Keys have the following permissions:
This is the primary endpoint for running ensemble strategies. It accepts a standard chat history and returns a generated response, fully compatible with the OpenAI format.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | The ID of the model to use. When using ai_crucible orchestration, this model serves as the arbiter/judge. When not using orchestration, this is the model that generates the response. |
messages |
array |
Yes | List of message objects (role, content). Content can be a string or an array of content parts for multimodal messages. |
stream |
boolean |
No | If true, returns a stream of Server-Sent Events (SSE). |
temperature |
number |
No | Sampling temperature to use, between 0 and 2. Higher values mean the model will take more risks. |
top_p |
number |
No | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. |
max_tokens |
number |
No | Maximum number of tokens to generate in the completion. |
stream_options |
object |
No | Streaming configuration. Use { include_usage: true } to receive token usage data in the final chunk when streaming. |
ai_crucible |
object |
No | Advanced configuration for ensemble strategies. |
ai_crucible)To orchestrate multiple models or use advanced strategies, pass the ai_crucible object in your request. This nested object pattern is the standard way to extend OpenAI-compatible APIs without param collisions.
{
"model": "gpt-4o",
"messages": [...],
"ai_crucible": {
"strategy": "competitive_refinement",
"iterations": 1,
"models": ["gemini-2.5-flash", "gpt-4o-mini"]
}
}
| Field | Type | Description |
|---|---|---|
strategy |
string |
The ensemble strategy ID (e.g., competitive_refinement, expert_panel). |
iterations |
number |
Number of refinement rounds (default: 1). |
rounds |
number |
Alias for iterations. Both parameters are supported for backward compatibility. |
models |
array |
List of model IDs to participate in the ensemble. |
includeFollowUpPrompts |
boolean |
(Optional) Whether to generate follow-up prompt suggestions (default: false). |
includeCandidates |
boolean |
(Optional) If true, includes individual model responses in the response candidates field. Default: false. |
includeReasoning |
boolean |
(Optional) If true, includes the Arbiter's explanation and winner selection. Default: false. Alternative to root-level reasoning param. |
systemPrompt |
string |
(Optional) Specific instructions for the Arbiter model (e.g., "You are a Pirate Arbiter"). Takes precedence over message history. |
[!TIP] System Prompt Fallback: If
ai_crucible.systemPromptis not provided, the API will attempt to extract the firstsystemmessage from themessagesarray to use as the Arbiter's instructions. Explicitly settingsystemPromptin the extension object is recommended for clarity.
[!IMPORTANT] Arbiter Model Selection: When using ensemble strategies via
ai_crucible, the rootmodelparameter specifies which model acts as the arbiter/judge to select the best response. Themodelsarray inai_cruciblespecifies the participant models that generate competing responses.
Example:
{
"model": "gemini-1.5-pro", // ← This model judges the responses
"messages": [...],
"ai_crucible": {
"models": ["gpt-4o", "claude-3-opus"] // ← These models compete
}
}
You can provide a system prompt via ai_crucible.systemPrompt or by including a message with role: "system" in the messages array. If both are provided, ai_crucible.systemPrompt takes precedence for the Arbiter model.
| Provider | Support Status | Notes |
|---|---|---|
| OpenAI | ✅ Supported | Native support |
| Anthropic | ✅ Supported | Mapped to top-level system parameter |
| Google Gemini | ✅ Supported | Native support |
| DeepSeek | ✅ Supported | Injected as system message |
| Mistral | ✅ Supported | Injected as system message |
| Kimi (Moonshot) | ✅ Supported | Injected as system message |
| Qwen (Aliyun) | ✅ Supported | Injected as system message |
| xAI (Grok) | ✅ Supported | Injected as system message |
| Together | ✅ Supported | Injected as system message |
Validation Rules:
"" is treated as undefined (no system prompt).Crucible supports the standard reasoning parameter for controlling explanation effort:
{
"model": "gemini-2.5-flash",
"reasoning": {
"effort": "high"
}
}
Setting reasoning.effort (any value) will enable the explanation field in the response. The effort level is currently not used to adjust the detail level.
You can control the randomness of the output using temperature and top_p. These parameters are applied to the final synthesis step (the Arbiter's response).
{
"model": "gpt-4o",
"messages": [...],
"temperature": 0.7,
"top_p": 0.9,
"ai_crucible": { ... }
}
For full OpenAI API compatibility, the following additional parameters are supported:
| Parameter | Type | Description |
|---|---|---|
n |
number |
Number of completions to generate (default: 1). |
stop |
string | string[] |
Stop sequences where the API will stop generating further tokens. |
presence_penalty |
number |
Penalize new tokens based on whether they appear in the text so far (-2.0 to 2.0). |
frequency_penalty |
number |
Penalize new tokens based on their frequency in the text so far (-2.0 to 2.0). |
logit_bias |
object |
Modify the likelihood of specified tokens appearing in the completion. |
user |
string |
Unique identifier representing your end-user for monitoring and abuse detection. |
[!NOTE] These parameters are primarily useful for advanced use cases and fine-tuning model behavior. Most users will find
temperatureandtop_psufficient for controlling output randomness.
curl https://api.ai-crucible.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AI_CRUCIBLE_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Write a poem about rust."}],
"ai_crucible": {
"strategy": "expert_panel",
"models": ["claude-3-opus", "gpt-4o"],
"iterations": 2
}
}'
AI Crucible supports multimodal attachments for both images and text-based files (including Markdown, JSON, CSV).
image/jpeg, image/png, image/webp, image/gif.text/plain, text/markdown (.md), application/json, text/csv, application/xml.Messages support two content formats:
"content": "Your message here""content": [{ type: "text", text: "..." }, { type: "image_url", ... }]When including attachments, use the array format with content parts.
Attachments are handled by the client and sent as standard OpenAI-compatible content parts.
Images should be passed as image_url content parts with base64-encoded data.
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,..."
}
}
]
}
Text-based attachments are appended to the message content as formatted text blocks.
{
"role": "user",
"content": "Analyze this file:\n\n--- Attachment: readme.md ---\n# Project Title\n...\n--- End Attachment ---"
}
The /evaluations endpoint allows you to programmatically evaluate model responses using AI judge models. Submit two or more model responses and receive detailed scoring across criteria like accuracy, creativity, clarity, completeness, and usefulness.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt |
string |
Yes | The original prompt that the model responses were generated from. |
responses |
array |
Yes | Array of model responses to evaluate. Each must include modelId, modelName, and response. |
judge_models |
array |
Yes | Array of judge models to use. Each must include id and name. Multiple judges produce consensus scoring. |
strategy |
string |
No | The ensemble strategy context (e.g., competitive_refinement). |
evaluation_mode |
string |
No | Evaluation mode: standard (pairwise comparison) or pointwise (independent scoring). Default: standard. |
weighted |
boolean |
No | Whether to use weighted scoring based on judge model weights. Default: false. |
curl https://api.ai-crucible.com/v1/evaluations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AI_CRUCIBLE_KEY" \
-d '{
"prompt": "What is the capital of France?",
"responses": [
{
"modelId": "gemini-3-flash",
"modelName": "Gemini 3 Flash",
"response": "The capital of France is Paris, renowned for the Eiffel Tower."
},
{
"modelId": "gpt-5-nano",
"modelName": "GPT-5 Nano",
"response": "Paris is the capital of France."
}
],
"judge_models": [
{ "id": "gemini-3-flash", "name": "Gemini 3 Flash" }
]
}'
{
"evaluations": [
{
"modelId": "gemini-3-flash",
"modelName": "Gemini 3 Flash",
"overallScore": 8.8,
"criteria": {
"accuracy": 10,
"creativity": 5,
"clarity": 10,
"completeness": 10,
"usefulness": 9
},
"reasoning": "Provides the correct answer with useful context about landmarks.",
"individualScores": [
{
"judgeId": "gemini-3-flash",
"judgeName": "Gemini 3 Flash",
"overallScore": 8.8,
"criteria": { "accuracy": 10, "creativity": 5, "clarity": 10 }
}
]
}
],
"usage": {
"prompt_tokens": 487,
"completion_tokens": 308,
"total_tokens": 1655,
"total_cost": 0.0012
}
}
[!TIP] Multi-Judge Consensus: When providing multiple judge models, scores are averaged across judges. Use
weighted: trueto apply model-specific quality weights for higher-fidelity scoring.
The /responses endpoint is optimized for batch processing and complex multi-turn workflows that don't fit the strict chat format.
{
"model": "gemini-3-flash",
"input": "Analyze this dataset for anomalies...",
"stream": false
}
Retrieve your current token usage and cost statistics. This helps you monitor your budget in real-time.
Response:
{
"object": "list",
"data": [
{
"object": "usage_record",
"period": "2024-03",
"total_tokens": 150000,
"total_cost": 0.45
}
]
}
The API uses standard HTTP status codes to indicate success or failure.
| Code | Meaning | Solution |
|---|---|---|
200 |
OK | Request succeeded. |
401 |
Unauthorized | Check your API Key. |
402 |
Payment Required | You have exceeded your monthly budget or balance. |
429 |
Too Many Requests | Rate limit exceeded. Slow down or request a quota increase. |
500 |
Internal Error | Something went wrong on our side. Please retry. |
Yes! Simply initialize the client with your AI Crucible base URL and API key.
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-aicrucible-...',
baseURL: 'https://api.ai-crucible.com/v1',
});
Streaming works identically to standard LLMs. For ensemble strategies like Collaborative Synthesis, you will receive the final synthesized chunk-by-chunk as it is generated by the synthesizer model. Intermediate steps (like panel discussions) happen server-side before the stream begins or are summarized in the final output.
The orchestrator automatically handles partial failures. If one model in a panel fails (e.g., due to rate limits), the strategy continues with the remaining healthy models, ensuring you still get a robust result.