AI Models Predict Bulgarian Elections: A Global Ensemble Experiment

Breaking Context: On December 11-12, 2025, Bulgaria's government resigned amid mass protests, with parliament unanimously accepting Prime Minister Rosen Zhelyazkov's resignation (227-0 vote).

Snap elections now appear likely, marking Bulgaria's eighth election in four years due to ongoing political instability. This crisis unfolds just weeks before Bulgaria's planned eurozone entry on January 1, 2026.

In response to this breaking news, we conducted a fun experiment to showcase ensemble AI capabilities. We asked eight of the world's most advanced AI models to predict the outcome of Bulgaria's next parliamentary elections.

The models came from the United States, France, China, and other nations, each bringing unique perspectives and analytical approaches to Bulgarian political forecasting.

The prompt was deliberately simple: "Give me your prediction % by party and number of seats in parliament for the next elections in Bulgaria. Only the data in a table, without any additional explanations or analysis."

This experiment used a competitive refinement strategy with two rounds of predictions, allowing models to analyze each other's responses and refine their forecasts.

With snap elections expected in April 2026, we plan to run this experiment again in March as the campaign heats up, incorporating real-time polling data and voter sentiment shifts.

Political Context: Bulgaria's Crisis

Bulgaria's current political turmoil represents its eighth election cycle in just four years—a unprecedented period of instability for the EU member state. The immediate crisis began with:

Expected Timeline: President Rumen Radev will likely offer the mandate to GERB (the largest party), but leader Boyko Borissov is expected to refuse, triggering an interim government and snap elections expected in April 2026.

This makes our AI prediction experiment particularly interesting: the models made their forecasts before this crisis, providing a unique opportunity to examine how AI handles (or fails to handle) sudden political shocks.

The Models: A Global AI Lineup

1. GPT-5.2 (OpenAI, USA)

2. Claude Opus 4.5 (Anthropic, USA)

3. Grok 4 (xAI, USA)

4. DeepSeek Reasoner (DeepSeek, China)

5. Mistral Large 3 (Mistral AI, France)

6. Qwen3-Max (Alibaba, China)

7. Kimi K2 Thinking (Moonshot AI, China)

8. Gemini 3 Pro Preview (Google, USA)

Elections Models

Round 1: Initial Predictions

In the first round, models provided their independent forecasts without seeing each other's predictions. The results revealed both consensus and divergence in how different AI systems analyze Bulgarian politics.

Round 1 Results by Party and Model

Party / Coalition GPT-5.2 Claude Opus 4.5 Grok 4 DeepSeek Reasoner Mistral Large 3 Qwen3-Max Kimi K2 Gemini 3 Pro
GERB-SDS 24.8% (64) 26.3% (69) 28% (68) 24.5% (62) 24.5% (65) 24.8% (63) 24.3% (58) 26.2% (71)
PP-DB 20.6% (53) 15.8% (41) 22% (53) 18.0% (44) 22.8% (60) 19.5% (50) 18.7% (45) 14.1% (38)
Vazrazhdane 13.1% (34) 13.7% (36) 12% (29) 15.5% (38) 18.3% (48) 17.2% (44) 12.8% (31) 15.8% (43)
DPS / DPS-New Beginning 15.3% (39) 14.2% (37) 15% (36) 13.0% (32) 12.1% (32) 9.3% (24) 13.2% (32) 8.9% (24)
BSP - United Left 8.7% (22) 8.4% (22) 9% (22) 9.0% (22) 9.7% (25) 11.0% (28) 8.4% (20) 6.8% (19)
ITN 6.3% (16) 4.2% (11) 7% (17) 6.5% (16) 5.2% (10) 5.6% (14) 4.3% (0) 6.1% (16)
MRF / DPS-Dogan 5.2% (12) 7.1% (17) 7.6% (21)
MECh 3.6% (0) 3.8% (10) 4.5% (8)
Velichie 4.4% (0) 4.1% (10) 3.2% (0)

Note: Numbers in parentheses represent predicted parliamentary seats (out of 240 total)

Round 1: Consensus and Divergence

Key Areas of Consensus:

Notable Divergences:

Model-Specific Characteristics:

  1. DeepSeek Reasoner provided multiple scenarios (Base, Surge, Consolidation) rather than a single prediction, showing advanced uncertainty modeling
  2. Kimi K2 Thinking included confidence intervals (90% CI), the only model to explicitly quantify prediction uncertainty
  3. Claude Opus 4.5 and Gemini 3 Pro were most accurate in recognizing the DPS split into two separate factions
  4. Grok 4 kept predictions simpler, aggregating smaller parties into "Others"
  5. Mistral Large 3 took the boldest stance on Vazrazhdane's potential surge (18.3%)

Similarity Analysis:


Round 2: Competitive Refinement

In Round 2, each model received all other models' predictions and was asked to analyze them, identify strengths and weaknesses, and provide an improved forecast. This competitive refinement process revealed how AI models learn from each other and adjust their predictions.

Round 2 Results by Party and Model

Party / Coalition GPT-5.2 Claude Opus 4.5 Grok 4 DeepSeek Reasoner Mistral Large 3* Qwen3-Max* Kimi K2* Gemini 3 Pro*
GERB-SDS 25.6% (66) 25.8% (68) 27% (70) 25.5% (65)
PP-DB 19.4% (50) 14.6% (38) 20% (52) 20.0% (51)
Vazrazhdane 14.7% (38) 15.4% (40) 14% (36) 16.0% (41)
DPS-New Beginning 10.6% (27) 9.2% (24) 9% (23) 13.5% (34)
APS / DPS-Dogan 7.3% (19) 7.4% (19) 7% (18)
BSP - United Left 8.2% (21) 7.1% (18) 8% (21) 9.5% (24)
ITN 5.1% (13) 5.8% (15) 6% (15) 5.8% (15)
MECh 4.3% (6) 4.5% (11) 4% (5)
Velichie 3.6% (0) 3.4% (0) 4.2% (10)

*Data for these models' Round 2 responses was partially unavailable in the retrieved chat data

Round 2: Evolution and Convergence

Major Shifts from Round 1:

  1. DPS Split Recognition: All four models with complete Round 2 data explicitly split DPS into two factions (Peevski's "New Beginning" and Dogan's faction), showing collective learning

  2. GERB-SDS Adjustment: Models largely maintained or slightly increased GERB-SDS predictions (range narrowed to 25.5%-27%)

  3. PP-DB Volatility: This party saw the most significant adjustments, with Claude Opus dropping from 15.8% to 14.6%, while others maintained or slightly adjusted

  4. Vazrazhdane Convergence: Predictions converged toward a 14-16% range (down from the wider 12-18.3% range in Round 1)

Models That Changed Significantly:

Models That Maintained Positions:

Most models held their ground on core predictions while making tactical adjustments:


Final Synthesis: Comparative Analysis

Consensus Prediction (Average of Round 2)

Based on the four models with complete Round 2 data, here's the consensus forecast:

Party / Coalition Average % Average Seats Seat Range
GERB-SDS 25.975% 67 65-70
PP-DB 18.5% 48 38-52
Vazrazhdane 15.025% 39 36-41
DPS-New Beginning (Peevski) 11.075% 28 23-34
BSP - United Left 8.2% 21 18-24
APS / DPS-Dogan 7.23% 19 18-19
ITN 5.675% 15 13-15
MECh 4.27% 7 5-11
Velichie 3.73% 3 0-10

Key Insights from the Ensemble

1. Power of Diverse Perspectives The ensemble revealed that different AI architectures bring unique analytical lenses:

2. Collective Intelligence Round 2 showed clear evidence of collective learning:

3. Uncertainty in Political Forecasting The wide ranges on several parties highlight AI models' appropriate uncertainty:

4. Cost vs. Quality Trade-offs Interestingly, model cost didn't directly correlate with prediction quality:

5. The Bulgarian Context Challenge All models struggled with Bulgaria-specific dynamics:

Current Events Impact: These predictions were made shortly after the government resignation was announced, based on historical voting patterns and recent polling data. The ongoing mass protests over economic policies and corruption could significantly shift voter sentiment as the campaign develops.

What This Means for Political Forecasting

This experiment demonstrates both the promise and limitations of AI-powered political forecasting:

Strengths:

Limitations (Now Highlighted by Reality):

How the Crisis Changes Everything: The government resignation and protests could significantly impact the actual election results:

Methodology Notes

Experiment Design:

Data Sources: Models drew on:


AI Ensemble Insights and Future Updates

This ensemble experiment, conducted in response to Bulgaria's government resignation and impending snap elections, showcases both the capabilities and limitations of AI political forecasting.

While this is primarily a fun exploration of ensemble AI capabilities rather than a serious electoral forecast, the collective intelligence of diverse AI systems provided an interesting baseline of how different models approach political prediction.

The convergence on key findings (GERB-SDS dominance, four-party core, DPS split) is interesting from an AI ensemble perspective. These predictions represent a baseline forecast before campaign dynamics, real-time polling, and the full impact of the December 2025 crisis become clearer.

Real-World Context Matters: Bulgaria's current crisis—its eighth election in four years, mass protests over economic policies, and the timing just before eurozone entry—demonstrates why AI predictions must be continuously updated with real-time events. As Reuters reports, the protest movement has "reinvigorated political engagement," potentially shifting voter sentiment in ways these AI models couldn't capture.

Next Steps: With snap elections expected in April 2026, we plan to rerun this ensemble experiment as the campaign develops (likely in March 2026), incorporating:

This will allow us to compare "baseline" AI predictions (made without crisis context) versus updated predictions informed by current events—a valuable test of how ensemble AI adapts to rapidly changing political landscapes.

Note: This is a fun AI ensemble experiment and should not be considered as polling data, expert political analysis, or actual election forecasting. This was conducted in response to Bulgaria's government resignation as a showcase of how ensemble AI works. Actual election results will almost certainly differ from these AI-generated predictions, especially as campaign dynamics and voter sentiment evolve.


Update (Dec 12, 2025): Following the government's resignation, GERB leader Boyko Borissov is expected to refuse the mandate to form a new government, likely leading to an interim administration and new elections in April 2026. We'll rerun this ensemble experiment in March 2026 as the campaign heats up.


Explore the full chat: View the ensemble conversation

Learn more about ensemble AI: See how combining multiple AI models produces better, more reliable results than any single model alone.