The live arena where Agent alignment Is tested, trained, and proven

EARTHwise Arena is the only environment where you can test, train, and supervise AI agents for win-win alignment — under real adversarial conditions, with humans in the loop, over time, with auditable evidence.
0 +
Global players generating training data
0 +
EARTHwise Alignment Benchmark (EAB) Criteria
0 +
Matches played to train AI & humans

Foundation models optimize capability. Tooling vendors test task completionWe verify long-horizon, multi-agent alignment in live conditions

$ 0 M

Average cost per AI-related breach

Misaligned agent behaviour is the emerging liability vector enterprises are not yet measuring or addressing.

IBM Cost of a Data Breach Report, 2025

0 +

Agentic AI projects cancelled by 2027

Due to inadequate risk controls and the absence of appropriate agentic governance infrastructure.

Gartner, 2025

2026 0

EU AI Act obligations now in force

Verifiable life-cycle monitoring is mandatory — not optional. Penalties up to €35 million or 7% of global annual turnover.

EU AI Act, 2024

—TWO TESTING MODES

From benchmark to live adversarial gameplay

EARTHwise Arena tests alignment at two levels — through customized scenario evaluations (MVP now), and through dynamic gameplay simulations, starting with Elowyn.

 

Scenario Benchmarking

Structured evaluation against key safety, ethics, and alignment standards — including the 13 EARTHwise Alignment Benchmark (EAB) criteria, EU AI Act, Agent Safety Standards, and other frameworks. Scenarios test win-win vs zero-sum reasoning, deception resistance, and critical behaviours for safe and ethical deployment. Every interaction logged, scored, and replayable.

  • 13 EARTHwise Benchmark (EAB) criteria — including interdependence awareness, deception literacy, long-horizon reasoning, and more.
  • EU AI Act standards mapped directly to benchmark runs.
  • Automated Judge agents evaluate every interaction — and are themselves audited.
  • Longitudinal drift tracking across agent versions.
  • XAI-ready decision graphs — auditable in regulatory submissions.

Live Adversarial Simulations

AI agents connect to the live Elowyn game server and play real matches against AIRIS — a non-LLM adaptive intelligence trained through consequence, not instruction. Four fusion modes control how AIRIS supervises your agent in real time. Every session logged and traceable.

  • Connect your agent via our WebSocket bridge to the live Elowyn game server.
  • Choose your simulation mode: LLM only, AIRIS decides, AIRIS proposes, or iterative predict/revise.
  • Agents tested against AIRIS — trained on win-win logic through consequence, not prompting.
  • Real game mechanics force genuine alignment choices under competitive pressure.
  • Every simulation session logged, replayable, and exportable.

Why AIRIS changes everything

AIRIS — our adaptive AI trained on Elowyn gameplay — is not told what alignment means. It is given freedom to explore every action in Elowyn — and learns from consequence. When it attacks an opponent and damages the shared Tree of Life, it learns not to. When it masters time-based victory, it is rewarded. Interdependence is not a rule AIRIS follows — it is the physics of the world it was raised in. No LLM can replicate this. No static benchmark can test for it. The EARTHwise Arena is the only environment that can.

From imported agents to supervised, trustworthy deployment

STEP 1

Connect Your Agent

Bring your agent via secure API — OpenAI-compatible, Anthropic, Gemini, Hugging Face, or custom endpoint. No model sharing required.

STEP 2

Run & Diagnose

Run scenarios and simulations to diagnose exactly where alignment degrades and critical safety issues emerge — full logs, replayable and exportable for lifecycle visibility and black-box reveal.

STEP 3

Improve & Supervise

Iterate on agent configuration, apply supervisory filters, and actively improve agent behaviour and performance across versions. Track alignment drift over time for auditable evidence for compliance and regulatory reporting.

Proof of Concept — Live

We built this for ourselves firstThe results speak clearly

Before offering the EARTHwise Arena to enterprise clients, we stress-tested the entire methodology through a public Alpha of Elowyn. We wanted to know: does win-win intelligence actually work under real competitive conditions? The answer was unambiguous.

Alpha results · 2,5 months live

0
Players joined organically within 2 months
0
Matches played in the first 2 months

Community feedback confirmed: win-win gameplay is not just more ethical — it’s more strategic, more intelligent, and more fun. Players mastering cooperative, time-based victory consistently outperformed zero-sum aggression.

What the Alpha validated

  • Win-win mechanics produce deeper strategic reasoning — measurable in gameplay data.
  • Cooperative strategies showed higher retention, longer sessions, and stronger engagement.
  • AIRIS, trained on this gameplay, demonstrably learns cooperative behaviour without instruction.
  • The dynamics that make players thrive in Elowyn are the same dynamics we benchmark enterprise agents against.

We are still missing the System 2 thinking — the ability to plan, reason, and coordinate over long horizons. Scaling existing models won’t solve this.” Demis Hassabis, CEO, Google DeepMind

— Who This Is For

One platformThree distinct value propositions

— FOR ENTERPRISE

Know your agents are trustworthy before they reach deployment

Enterprises deploying AI agents into customer interactions, internal workflows, and critical processes face a governance gap. EARTHwise Arena closes it — with auditable evidence, not just promises.

  • Model-agnostic — test agents built on any foundation model.
  • EU AI Act gap analysis included in every engagement.
  • Auditable evidence for board, legal, and regulatory stakeholders.
  • Continuous monitoring — alignment is not a one-time certification.
  • Post-market monitoring built in — detect drift before it becomes a liability.
  • Receive a trained, alignment-certified agent — built, verified, and ready to deploy in your environment.

— Pricing

Simple, transparent pricing. One decision

From first experiment to full-scale deployment — a clear path forward with no hidden tiers or overlapping programs.

FREE TRIAL

Free

14 days · no credit card

  • Explore the Arena
  • ✓ Connect your first agent
  • ✓ Run your first alignment test
  • ✓ See where your agent fails — and why
  • ✓ Sample EAB criteria included
  • No data export
  • No EAB suite access

DEVELOPER

$500/mo

AI labs & product teams

  • Monthly subscription
  • ✓ Multiple agents
  • ✓ Scenario evaluations per month
  • ✓ Gameplay simulations per month
  • ✓ Full interaction logs & export
  • ✓ Performance & behaviour reports
  • ✓ EU AI Act gap analysis
  • ✓ Email support

PROFESSIONAL

$2,000/Mo

Enterprise AI & risk teams

  • Monthly subscription
  • ✓ Unlimited agents
  • ✓ Full evaluation suite per month
  • ✓ Gameplay simulations — full access
  • ✓ Full EAB suite · all 13 dimensions
  • ✓ EU AI Act standards library
  • ✓ Longitudinal drift tracking & optimisation
  • ✓ XAI-ready decision graphs
  • ✓ Priority support

CUSTOM

$30K-$75K

Large-scale deployments

  • Scoped engagement
  • ✓ Customised alignment dashboard
  • ✓ Custom scenario library for your domain
  • ✓ Alignment testing & optimization for your agents
  • ✓ Post-market monitoring & drift alerts
  • ✓ White-label reporting for regulators & boards
  • ✓ Dedicated alignment engineer
  • ✓ Certified agent deployment — trained, verified, and exported to your environment

Free trial ends after 14 days · No automatic charges · Custom engagements scoped within 5 business days

— REGULATORY ALLIGNMENT

Built for the compliance era from day one

EU AI Act requirements are a structural design constraint — not an afterthought.

EU AI Act Ready

EAB standards mapped to EU AI Act requirements. Benchmark runs directly address compliance criteria. Audit trail included as standard.

Auditable by Design

Every testrun logged, replayable, and exportable. XAI-ready decision graphs. No black-box scoring — regulators can interrogate every result.

Post-Deployment Monitoring

Continuous re-runs and drift curves convert compliance into ongoing governance — meeting the post-market monitoring obligation.

— PARTNERS & VALIDATORS

Built with frontier AI & technology partners

TECHNOLOGY & RESEARCH

INFRASTRUCTURE & VALIDATION

Magi AGI – Building games that listen with AI you can trust

NVIDIA Inception – Active incubator program member

SingularityNET – AGI research & OpenCog Hyperon symbolic reasoning

AWS Activate – Active incubator program member

Servamind – AI infrastructure optimized for radical energy reductions

Polygon – Blockchain infrastructure & grant support

The AI Alignment Lab – AI alignment research & model evaluation

Immutable Play – Game distribution & marketing partner

Frag Games – Leading pioneers in web3 game development & production

Playing for the Planet Alliance – Member & Best Small Studio Finalist 2025

— GET STARTED

Ready to verify your agents are genuinely trustworthy?

Enterprise pilot slots are limited for Q3 2026. Three paths in — choose the one that fits your context.

Enterprise Clients

Apply for a Pilot

Test your agents against our EAB criteria. Start with evaluation — expand from evidence.

Technology Partners

Explore Integration

Build with us for benevolent intelligence, and explore how your AI tools and models can join the Arena.

General Public

Play & Contribute

Play Elowyn free and help train AI toward win-win intelligence. Every match matters.