Humanoid Benchmark
Independent humanoid-robot index · Xodexa
XHI v1.0.0

The Humanoid Interop Standard (HIS) — draft

Layer 0 live · Layer 3 draft + mock · Layers 1,2,4 unstarted

A proposed open standard for describing humanoid robot capabilities and letting LLM/VLA controllers discover and command any compliant robot without vendor-specific glue code — the same way MCP lets one LLM host talk to any tool server. This is a draft for discussion, not a ratified spec. Layer 0 — the capability manifest — is live, generated from the same cited data the XHI ranking already scores. Layer 3 — the tool-calling contract — has a draft schema and a working reference implementation in the site's repository, but is not connected to any real robot; no robot in this index implements a control protocol yet. Nothing below competes with ROS 2 or MCP; it's meant to be a thin certification/orchestration layer on top of both.

Layering

LayerPurposeStatus
0Embodiment taxonomy — what a robot physically islive
1Perception/state interface — how a robot reports sensing & statusnot started
2Action interface — how a controller commands a body (capability-negotiated, not identical schemas)not started
3Orchestration/tool contract — MCP-shaped tool discovery for LLM control, layered on ROS 2 for real-time transportdraft + mock reference
4Multi-agent/fleet coordination — cross-vendor robot-to-robot coordinationnot started
5Certification — compliance levels, conformance testing, badgingschema only

Layer 0: the capability manifest

Every robot in the index already exposes a live, generated manifest — a machine-readable "spec sheet" an orchestrating LLM could read before issuing any control-protocol call. It's derived directly from the same fields the XHI ranking scores, not a second dataset to maintain.

Form-factor distribution, this index

Classified per-robot from each robot's own cited sourcing — not inferred. One robot (AgiBot D2 Max) is correctly tagged quadruped even though it's filed in a humanoid ranking, because that's what its own sources say it is.

form_factorRobots
bipedal54 / 71
wheeled-base10 / 71
other5 / 71
arm-only1 / 71
quadruped1 / 71

Try it — live manifests across the diversity of the dataset

Figure 02 · bipedal
Optimus (Gen 2/3) · bipedal
Pepper · wheeled-base
AgiBot D2 Max · quadruped

Layer 3: the tool-calling contract (draft, not connected to a real robot)

Four JSON-RPC methods — initialize, tools/list, tools/call, tools/confirm — modeled on MCP's tool-discovery shape, plus a risk_tier on every tool that a generic tool protocol doesn't need but a robot control protocol can't skip: a bad MCP call corrupts a file, a bad robot call can move mass near a human. A call whose risk tier is at or above the robot's declared confirmation_required_above_tier returns pending_confirmation instead of executing — the orchestrator must make a second, explicit call to actually run it.

The reference implementation is a simulated Figure 02 tool server exposing four Tier-1 skills (speak, navigate_to, grasp, hand_off) with real gating behavior — every call is asserted end-to-end, not just documented. It lives in the repository, not on this server, because there's nothing real for it to control yet.

  • Spec: standard/layer3-tool-contract.md
  • Tool schema: tool-definition.schema.json
  • Run it: python -m standard.reference.demo — prints every JSON-RPC message in the full initialize → tools/list → tools/call → tools/confirm flow

Certification levels (Layer 5 — schema exists, no test suite yet)

LevelMeaning
documentedSelf-reported manifest exists, cross-checked against ≥2 sources like every other XHI figure — no live protocol test performed.
conformantPasses an automated conformance suite: tool discovery resolves, schemas validate, safety-tier confirmation gating actually blocks correctly.
interop-provenA reference orchestrating LLM that has never seen this robot before completes a benchmark task suite zero-shot — true demonstrated interoperability.

Every robot on this site is currently documented — no Layer 3 control protocol exists yet for any of them, so conformant and interop-proven are aspirational levels defined in the schema, not achieved by anything today.

This is a draft. Layers 1, 2, and 4 are not designed yet, and Layer 3's reference implementation is a simulation, not a connection to real hardware. If you're a robot vendor, an agent-framework builder, or a researcher and want to shape this, the schemas, generator, and reference tool server are all open in the site's repository under standard/ — feedback welcome via the same sourcing/citation standard the rest of this site holds itself to.