agentobjectmodel.org

AOM Specification

This directory contains the complete JSON Schema definitions for the Agent Object Model (AOM)™ protocol. To validate surfaces and outputs, use the CLI from the repo root (aom.py / aom.mjs). For a high-level overview and motivation, see the AOM v0.1.0 White Paper.

Schemas

aom-input-schema.json (core surface schema)

What it defines: The structure of an AOM surface (screen, modal, panel, widget, drawer).

Validates: *.aom.json files in examples/ (for example examples/v0.1.0/)

Key sections:

Use when: Generating or consuming AOM from web/mobile screens.

aom-output-schema.json (agent output schema)

What it defines: The structure of an agent’s response to an AOM surface.

Validates: *.output.json files in Examples/ or examples/ (typically under each surface’s outputs/ folder).

Key sections:

Use when: Building agents that operate on AOM surfaces.

Secure/signed payloads are out of scope for this spec.

AOM Contracts

AOM defines two machine-readable contracts. Both are required for AOM-compliant agent systems:

Contract Schema File Extension Purpose
Surface aom-input-schema.json *.aom.json Describes what the agent sees (UI state, available actions, entities)
Output aom-output-schema.json *.output.json Describes what the agent does (thought, action, result, confidence)

Where the documents fit

The diagram below shows the runtime flow: the User and the Worker Agent do not communicate directly—all interaction is via the Master Agent. The User triggers the workflow and sends information to the Master; the Master triggers and informs the Worker Agent. The Agent reads surfaces (Input AOM), performs actions, and moves between surfaces (e.g. Surface A → Surface A/B); surfaces serve Input AOM and return Success or Error. The Agent then informs the Master with Output AOM, and the Master informs the User. For the full flow including site and page policy checks, see Sequence diagram (for geeks).

sequenceDiagram
    autonumber

    actor User as User
    participant Master as Master Agent
    participant Agent as Worker Agent
    participant Site as Site (origin)
    participant Surf1 as Surface A
    participant Surf2 as Surface A/B

    %% 0. User and Agent communicate only via Master
    User->>Master: Triggers workflow with Information
    Master->>Agent: Triggers workflow with Information

    %% 1. Agent fetches site policy
    Agent->>Site: GET /.well-known/aom-policy.json
    Site-->>Agent: Site policy (forbidden | allowed | open)

    alt Site policy = forbidden
        rect rgb(255, 230, 230)
            Agent->>Master: Surface forbidden (exit)
            Master->>User: Surface forbidden
        end
    else Site policy = allowed or open
        rect rgb(230, 240, 255)
            %% 2. First surface serves AOM; Agent reads and checks page policy
            Surf1-->>Surf1: Serves Input AOM
            Agent->>Surf1: Reads Input AOM
            Agent-->>Agent: Checks page automation_policy (forbidden | allowed | open)

            alt Page automation_policy = forbidden
                rect rgb(255, 230, 230)
                    Agent->>Master: Page forbidden (exit)
                    Master->>User: Page forbidden
                end
            else Page = allowed or open
                rect rgb(232, 245, 233)
                    %% 3. Agent checks if it has all data; need more info goes via Master to User
                    Agent-->>Agent: Checks if it has all data

                    alt Has all data
                        rect rgb(232, 245, 233)
                            Note over Agent: Proceed with current data
                        end
                    else Needs more information
                        rect rgb(255, 248, 225)
                            Agent->>Master: No. Give more Information
                            Master->>User: No. Give more Information
                            User->>Master: Sends more Information
                            Master->>Agent: Sends more Information
                        end
                    end

                    Agent->>Surf1: Submits action (fills Input AOM)
                    Agent->>Surf1: Performs Action of Input AOM
                    Surf1->>Surf2: Navigates to same or next surface
                end

                %% 5. Next surface serves AOM; Agent reads and checks page policy again
                Note over Agent: Waits for next surface
                Surf2-->>Surf2: Serves Input AOM
                Agent->>Surf2: Reads Input AOM
                Agent-->>Agent: Checks page automation_policy (forbidden | allowed | open)

                alt Page automation_policy = forbidden
                    rect rgb(255, 230, 230)
                        Agent->>Master: Page forbidden (exit)
                        Master->>User: Page forbidden
                    end
                else Page = allowed or open
                    rect rgb(240, 248, 255)
                        %% 6. Success: Agent informs Master, Master informs User. Error: escalate via Master.
                        Surf2-->>Agent: Returns response

                        alt Success
                            rect rgb(232, 245, 233)
                                Agent-->>Agent: Converts Input AOM to Output AOM
                                Agent->>Master: Informs Output AOM
                                Master->>User: Informs Output AOM
                            end
                        else Error
                            rect rgb(255, 245, 238)
                                Agent-->>Agent: Check Error
                                alt Can Fix
                                    rect rgb(232, 245, 233)
                                        Agent->>Surf2: Retry (e.g. read again / resubmit) with Input AOM
                                    end
                                else Can't Fix (Converts to Output AOM)
                                    rect rgb(255, 230, 230)
                                        Agent->>Master: No. Give more Information with Output AOM
                                        Master->>User: No. Give more Information with Output AOM
                                    end
                                end
                            end
                        end
                    end
                end
            end
        end
    end

Versioning

Both schemas use semantic versioning:

Each AOM surface must declare its version:

{
  "aom_version": "0.1.0",
  ...
}

Key principles (quick)

These mirror the design intent throughout the spec:

  1. Task-centric — organized around user goals, not UI layout.
  2. Entity-driven — data structures are explicit and typed.
  3. Action-oriented — what the agent can do is enumerated and validated.
  4. State-aware — workflows and session context can be represented explicitly.
  5. Layout-free — no CSS, coordinates, or presentation noise.
  6. Semantic-only — meaning first; avoid DOM coupling.
  7. Automation guardrailsforbidden / allowed / open control how agents may use the surface.

Agent identity and traceability

Agent identity (agent_id, agent_name) lives in the output and in the request the agent sends to the site when invoking an action. The input (surface) does not contain the agent’s identity; it may declare that when the agent submits a request on this site it must include agent_id (and optionally agent_name) in that request.

Design Principles

1. Task-Centric, Not DOM-Centric

AOM describes what users can accomplish (tasks, actions, entities), not the HTML structure.

Why: Agents reason about goals, not CSS selectors.

2. Entity-Driven

Domain objects (Product, Order, User) are first-class citizens with schemas, runtime validations, and current values.

Why: Agents operate on structured data, not unstructured text.

3. Declarative Actions

Actions declare their inputs, outputs, effects, priorities, and preconditions.

Why: Agents can plan, validate, and execute actions safely.

4. Production Intelligence

AOM natively supports automated testing via signals.test_cases and runtime escalation gating via meta.confidence.

Why: Agents require strict validation and human-in-the-loop fallback paths for enterprise reliability.

5. Mode Flexibility

Supports both single-shot (one action → done) and flow (multi-step workflows).

Why: Different tasks have different execution patterns.

6. Runtime-Agnostic

AOM is JSON. Works with any agent framework, LLM, or automation tool.

Why: Interoperability across ecosystems.

JSON Schema Details

Both schemas use JSON Schema Draft 2020-12:

Required vs Optional Fields

Input/core schema (aom-input-schema.json):

Output schema (aom-output-schema.json):


Extending AOM

Custom Fields

Both schemas allow additionalProperties: true in specific sections:

Example:

{
  "context": {
    "app_name": "MyApp",
    "locale": "en-US",
    "custom_tenant_id": "acme-corp",
    "custom_feature_flags": ["beta_ui", "dark_mode"]
  }
}

Custom Entity Types

Entity schemas support arbitrary field types and custom validation rules:

{
  "entities": {
    "CustomWidget": {
      "schema": {
        "widget_id": {"type": "string", "required": true},
        "config": {"type": "object", "required": false}
      },
      "current": {
        "widget_id": "w123",
        "config": {"color": "blue", "size": "large"}
      }
    }
  }
}

Optional: binds_to (agents.json Integration)

Actions can optionally reference external API/tool definitions via the binds_to field:

{
  "actions": [
    {
      "id": "submit_checkout",
      "label": "Place order",
      "category": "mutation",
      "description": "Submit checkout and create order.",
      "input_entities": ["CheckoutIntent"],
      "output_entities": ["OrderConfirmation"],
      "effects": [
        "entities.OrderConfirmation.current = shop_api.place(...)",
        "state.workflow.step_id = 'order_placed'"
      ],
      "binds_to": {
        "type": "agent.workflow_step",
        "ref": "place_order_confirm_checkout",
        "optional": true
      }
    }
  ]
}

When to use:

Your runtime has an external tool registry (e.g., agents.json, MCP tools, OpenAPI specs)

You want agents to call real APIs instead of simulating via effects

When binds_to is present:

Runtime tries to resolve binds_to.ref from external registry

If found → use external tool schema (parameters, authentication, etc.)

If not found AND optional: true → fall back to AOM’s inline effects

If not found AND optional: false → fail with clear error

Schema:

type (string) — Namespace/type of external binding (e.g., “agent.workflow_step”, “mcp.tool”, “openapi.operation”)

ref (string) — External identifier (tool name, operation ID, etc.)

optional (boolean, default false) — Whether binding is required

Default behavior: If binds_to is omitted, runtime executes action using AOM’s effects only.

Roadmap: Auto-resolution from common tool registries.


Optional: A2H (Agent-to-Human) Integration

AOM natively supports the industry-standard A2H protocol for safe Human-in-the-Loop (HITL) escalations. This allows the surface to dictate when an agent must pause and ask a human for approval or data.

1. Defining the Policy in the Surface (aom-core-schema)

The surface defines which actions require human intervention via the a2h_policy object on an action:

{
  "actions": [
    {
      "id": "delete_database",
      "label": "Delete Production DB",
      "category": "mutation",
      "a2h_policy": {
        "requires_authorization": true,
        "escalation_channel": "in_app"
      }
    }
  ]
}

2. Executing the Intent in the Output (aom-output-schema)

When the agent realizes it needs to escalate (either due to the surface’s a2h_policy or low internal confidence), it outputs an a2h_intent inside the meta block:

{
  "mode": "flow",
  "action": { "action_id": "none" },
  "meta": {
    "done": false,
    "confidence": 0.4,
    "a2h_intent": {
      "type": "AUTHORIZE",
      "message": "I am about to delete the production database. Do I have your approval to proceed?"
    }
  }
}

Supported A2H Intents:

Validation tools

tools are organized by language under tools/ so you can use only Python or only Node. See tools/README.md.

Python

# From repo root (pip install -r tools/python/validate/requirements.txt first)
python tools/python/validate/validate.py spec/v0.1.0/aom-input-schema.json examples/v0.1.0/login-single/login.aom.json
python tools/python/validate/validate_all.py
python tools/python/validate/validate_all.py v0.1.0/ecom-flow

Node

# From repo root (npm install in tools/node/validate first)
node tools/node/validate/validate.js spec/v0.1.0/aom-input-schema.json examples/v0.1.0/login-single/login.aom.json
node tools/node/validate/validate_all.js
node tools/node/validate/validate_all.js v0.1.0/ecom-flow

Generating output files for testing

Golden *.output.json files under each example’s outputs/ folder are generated by the create-outputs tools. From repo root:

python tools/python/create-outputs/create_outputs.py
# or
node tools/node/create-outputs/create_outputs.js

Schema Changelog

v0.1.0 (2026-02-26)

Initial public release (current)

Roadmap: v0.2.0 (not yet released)

Future versions MAY introduce:


FAQ

Q: Why separate surface and output schemas?
A: Surfaces describe what’s available (input to agent), outputs describe what the agent decided (output from agent). Different lifecycles, different consumers.

Q: Can I use AOM with non-LLM agents?
A: Yes. AOM is JSON. Any system that can parse JSON and make decisions can consume AOM.

Q: Does AOM require specific UI frameworks?
A: No. AOM is framework-agnostic. Generate it from React, Vue, mobile apps, or even server-rendered HTML.

Q: What about authentication/security?
A: AOM surfaces can include state.session.authenticated and state.session.user_id. Authorization logic lives in your runtime, not the schema.

Q: Can AOM represent native mobile screens?
A: Yes. surface_kind supports screens, modals, panels, drawers, widgets. The abstraction works for web and mobile.


References


Contributing

Found an issue or have a suggestion?

  1. Validate your examples against the schemas first
  2. Open an issue with concrete examples
  3. Propose changes with before/after JSON snippets

Schema improvements should maintain backward compatibility when possible.