Clawperator Node Runtime and API Design

Product naming:

Product: Clawperator
Android package/application namespace: com.clawperator.operator

Purpose

Clawperator is a deterministic actuator tool that allows agents to execute Android automations on behalf of a user. It provides a stable layer for LLM-driven device control with deterministic inputs/outputs, eliminating the need for brittle, direct recipe-specific shell scripting.

Execution model:

Agents call Clawperator CLI/API.
Clawperator performs adb and Android tooling interactions.
Clawperator sends validated runtime commands to Android (ACTION_AGENT_COMMAND).
Clawperator returns structured execution results.

Critical requirement:

Skill artifacts are optional.
If no artifact exists, or an artifact is wrong/stale due to UI drift, feature flags, staged rollouts, or account-level variants, agents must still execute using generic runtime actions and live UI observation.

Agent-customer policy:

The Clawperator Node runtime interface (CLI + HTTP API) is the primary/default interface for agents.
The Android APK/runtime service is an execution target, not the agent-facing integration surface.
Agents should not need direct adb for common tasks.
Raw adb remains available as an explicit fallback for edge cases and debugging.

Design implication:

If a workflow is common (for example package listing, screenshots, device discovery, app open/close, execution, snapshot, logs), provide a first-class Clawperator command/API for it.

Shipped Commands

Core commands:

clawperator doctor: Validate prerequisites and environment.
clawperator devices: Discover connected device IDs.
clawperator packages list: Confirm presence of receiver and target apps on device.
clawperator execute: Run an execution JSON payload.
clawperator observe snapshot: Get current UI hierarchy as hierarchy_xml.
clawperator observe screenshot: Capture device screen.
clawperator action [open-app|click|read|wait|type]: Single-step interaction wrappers.
clawperator serve: Start HTTP/SSE server for remote agent access.
clawperator doctor --fix: Best-effort environment remediation.
clawperator skills install/update/search/run: Skills lifecycle.
clawperator version --check-compat: CLI/APK compatibility check.

Contracts:

Canonical Envelope: [Clawperator-Result] {JSON} is the ONLY way success/failure is reported.
expectedFormat Required: Every observation/execution must include expectedFormat: "android-ui-automator".
Single-Flight Lock: Only one execution per deviceId / receiverPackage at a time. Overlaps return EXECUTION_CONFLICT_IN_FLIGHT.

HTTP API Server (`serve`)

When running clawperator serve [--port <number>] [--host <string>], a local HTTP server is started to allow remote agents to interact with Clawperator without direct CLI access.

⚠️ Security Warning: The HTTP API currently provides no authentication or authorization. By default, it binds to 127.0.0.1 (localhost) for safety. If you bind to 0.0.0.0 or a public IP via --host, any client on your network can remotely control your connected Android devices. Only expose this API on trusted networks or behind an authenticated gateway.

REST Endpoints

GET /devices: List all connected Android devices and their states.
POST /execute: Execute a full JSON execution payload.
- Body: {"execution": {...}, "deviceId": "...", "receiverPackage": "..."}
- Returns: RunExecutionResult (200 OK or 4xx/5xx on failure).
- Status 423 Locked: Returned if another execution is in flight for the target device.
POST /observe/snapshot: Quick helper for UI capture.
- Body: {"deviceId": "...", "receiverPackage": "..."}
POST /observe/screenshot: Quick helper for visual capture.
- Body: {"deviceId": "...", "receiverPackage": "..."}

Event Streaming (SSE)

The server provides a real-time event stream at GET /events. Callers should use a standard SSE client to subscribe.

Event: clawperator:result: Emitted when an execution reaches a terminal state (success or failure) and a deviceId is known.
- Data: {"deviceId": "...", "envelope": {...}}
Event: clawperator:execution: Emitted for every attempt to run an execution, including pre-resolution failures.
- Data: {"deviceId": "...", "input": {...}, "result": {...}}
Event: heartbeat: Upon connection, a {"code": "CONNECTED", ...} message is sent to verify the stream is active.

Concurrency and Locking

The server utilizes an in-memory single-flight lock per deviceId. If a second request arrives for the same device while an execution is in progress, the server returns HTTP 423 (Locked) immediately.

Determinism Doctrine

No Hidden Logic: Clawperator never retries a failed action or auto-falls back to a different strategy (e.g., from artifact to direct).
Pre-Flight Validation: Every execution is validated against the target device and receiver capabilities before any ADB call is made.
Canonical Result: Exactly one terminal envelope per commandId. If a timeout occurs, the CLI emits a RESULT_ENVELOPE_TIMEOUT error.

Error Taxonomy

LLM agents must use these codes to decide their next step.

Setup & Connectivity

ADB_NOT_FOUND: ADB is missing from PATH.
NO_DEVICES: No Android devices are connected via USB/Network.
MULTIPLE_DEVICES_DEVICE_ID_REQUIRED: More than one device exists; specify --device-id.
RECEIVER_NOT_INSTALLED: The target receiver package is not on the device.

Execution & State

EXECUTION_VALIDATION_FAILED: The execution JSON is malformed or invalid.
EXECUTION_ACTION_UNSUPPORTED: The requested action type is not supported by the runtime.
EXECUTION_CONFLICT_IN_FLIGHT: A command is already running on the target device.
RESULT_ENVELOPE_TIMEOUT: The command ran but no terminal envelope was received within the timeout.
RESULT_ENVELOPE_MALFORMED: Logcat emitted an invalid JSON envelope.

UI & Nodes

NODE_NOT_FOUND: The selector (matcher) failed to find the target UI element.
NODE_NOT_CLICKABLE: The target element was found but is not enabled/clickable.
SECURITY_BLOCK_DETECTED: A system-level security overlay (e.g., "Package Installer" or "Permission Dialog") is blocking interaction.

Detailed Step-Level Error Handling

While the top-level status indicates overall command success, individual stepResults can provide granular failure diagnostics. This is essential for agents to reason about partial completions.

Step Error Format

When a step fails but the runtime continues (or fails fast), the stepResults entry will include: - success: false - data.error: A stable machine-readable error code. - data.message: A human-readable (and LLM-readable) explanation.

Example: `UNSUPPORTED_RUNTIME_CLOSE`

This error occurs when a close_app action is dispatched to the Android runtime. Because of sandbox restrictions, the runtime cannot reliably close other apps.

Desired Outcome: The agent should see this error and know that the 'Hand' (Node CLI) is responsible for pre-flight closure via ADB.

{
  "id": "step-1",
  "actionType": "close_app",
  "success": false,
  "data": {
    "application_id": "com.example.app",
    "error": "UNSUPPORTED_RUNTIME_CLOSE",
    "message": "Android runtime cannot reliably close apps. Use the Clawperator Node API or 'adb shell am force-stop' directly for this action."
  }
}

Safety & Concurrency

In-Flight Semantics

A command is considered "in-flight" from the moment the ADB broadcast is sent until the [Clawperator-Result] is received or the timeoutMs is reached. If a command times out, the lock is held for an additional 2000ms "settle" window before allowing the next execution.

PII Redaction Policy

By default, Clawperator returns full-fidelity UI text to the agent for maximum reasoning accuracy.

User Warning: Results will contain sensitive data (names, account digits, OTPs) if they are visible on the screen.
Agent Mitigation: Do not ship raw Clawperator results to long-term storage without user consent.

API-First, ADB-Capable

This runtime is intentionally API-first:

Agents should use Clawperator commands/APIs by default.
Clawperator should wrap common Android/adb operations behind stable, typed contracts.
Direct adb usage is a fallback path, not the baseline integration model.

Direct adb is still supported for:

unsupported/emerging edge cases,
low-level diagnostics,
temporary gaps before a stable Clawperator primitive exists.

When fallback adb is used, Clawperator should still encourage convergence back to first-class APIs by:

exposing equivalent primitives as they become common,
keeping result/error formats structured and machine-readable,
documenting fallback-to-API migration paths.

Skill Artifact Optionality and Failure Handling

Skill artifacts are optional, but fallback behavior is explicit:

If artifact compile succeeds, execute compiled execution.
If artifact compile fails, Clawperator returns a structured compile error and does not auto-fallback.
If runtime verification fails, Clawperator returns a structured execution failure and does not auto-retry with alternate strategy.
Agent chooses next step (retry, inspect UI, switch to direct actions, or abort).

Runtime must expose a mode on each execution:

artifact_compiled
direct

This keeps behavior deterministic and avoids hidden control-flow in the runtime.

Execution Unit Contract

Use one term everywhere: execution.

compile produces an execution.
execute runs an execution.

Execution schema aligns with Android AgentCommand constraints.

Execution input may come from:

skill artifact compile output, or
direct action list authored by agent/tooling.

Example execution:

{
  "commandId": "cmd-123",
  "taskId": "task-123",
  "source": "openclaw",
  "expectedFormat": "android-ui-automator",
  "timeoutMs": 90000,
  "actions": [
    { "id": "close", "type": "close_app", "params": { "applicationId": "com.example.app" } },
    { "id": "open", "type": "open_app", "params": { "applicationId": "com.example.app" } },
    { "id": "wait", "type": "sleep", "params": { "durationMs": 3000 } }
  ]
}

Device Selection Policy (v1)

deviceId?: string is supported on execute/observe.

Selection behavior:

If exactly one connected device exists and deviceId is omitted, use that device.
If more than one connected device exists, deviceId is required.
If provided deviceId is not connected in device state, fail preflight.

Agentic Best-Effort Mode

Best-effort mode is a first-class execution path for unknown or drifting UIs.

Behavior goals:

Observe current UI (snapshot_ui).
Identify likely anchors (toolbar/tab/menu/button/search patterns).
Attempt constrained navigation/action.
Re-observe and verify progress.
Retry within safety bounds.

Best-effort does not imply unsafe freeform behavior; all attempts remain within validated runtime action limits and capability policy.

Important ownership split:

Clawperator provides primitives and structured observations.
The agent owns exploration policy/strategy.
Clawperator should not silently invent fallback control flow.

Cardinality drift handling:

Execution should tolerate mismatches between recipe assumptions and live UI (for example expected second device tile but only one exists).
Runtime returns structured ambiguity/partial outcomes rather than hard-failing every mismatch.
Agent decides whether to proceed with alternate target selection or stop.

Result Transport Channel (v1 choice)

Chosen v1 mechanism:

logcat JSON envelope with strict prefix.

Required Android emission format (single line):

[Clawperator-Result] {"commandId":"...","taskId":"...","status":"success|failed","stepResults":[...],"error":null}

Current implementation note: - Android emits canonical [Clawperator-Result] terminal envelopes for command completion.

Rules:

Exactly one terminal result envelope per commandId.
Envelope payload must be valid single-line JSON.
Clawperator parser filters by commandId and prefix.
Non-envelope logs are ignored for result semantics.

This removes ad-hoc scraping patterns and provides deterministic parsing until a stronger transport is added.

Additionally, intermediate observation envelopes may be emitted with prefix:

[Clawperator-Event] {json...}

This supports agent feedback loops during best-effort execution.

Safety Bounds (hard constants)

Public limits (v1):

MAX_EXECUTION_ACTIONS = 50
MAX_EXECUTION_TIMEOUT_MS = 120000
MIN_EXECUTION_TIMEOUT_MS = 1000
MAX_PAYLOAD_BYTES = 64000
MAX_RETRY_ATTEMPTS_PER_STEP = 10
MAX_SNAPSHOT_LINES = 2000
MAX_SNAPSHOT_BYTES = 262144

Action policy:

denylist by default for unsupported/unsafe action types
allow only runtime-supported actions in v1

Best-effort specific bounds:

MAX_BEST_EFFORT_STEPS = 30
MAX_BEST_EFFORT_RUNTIME_MS = 180000
MAX_CONSECUTIVE_FAILED_ATTEMPTS = 5

Supported action types (v1):

open_app
close_app
wait_for_node
click
scroll_and_click
read_text
snapshot_ui
sleep
type_text
doctor_ping

Doctor and Dependency Management

clawperator doctor checks:

adb installed and executable
adb server reachable
connected devices and states
target package presence and version compatibility
Android Developer Options and USB debugging (advisory)
end-to-end handshake via doctor_ping

clawperator doctor --fix capabilities (best effort):

restart adb server
run clawperator grant-device-permissions
print exact remediation when automatic fix is unavailable

See Clawperator Doctor for the full check list and JSON report shape.

Skill Integration Mechanism

Canonical source of skills:

clawperator-skills repository

Distribution model:

clawperator-skills CI generates skills-index.json on main.
clawperator skills install clones the local skills checkout on first setup.
clawperator skills update [--ref <ref>] refreshes the checkout and can pin to a specific ref when needed.
Local cache stores synced artifacts for deterministic offline execution.

Runtime should execute against cached/pinned skill content, not live network fetches during execution.

Skill compilation requirements are defined in:

docs/design/skill-design.md

When skill artifacts are missing/stale, runtime can still execute direct executions supplied by the agent.

Skill Implementation Language Strategy

To set a maintainable baseline for future skills:

Preferred language for new non-trivial skills: Node.js with TypeScript.
Bash is allowed only for thin wrappers and simple glue.
Python is a planned secondary path after Node contracts and tooling are stable.

Rationale:

Better testability, typing, and reuse for parsing-heavy and multimodal workflows.
Safer payload construction and lower shell-quoting risk than large Bash scripts.
Cleaner evolution toward SDK-backed skill execution.

Migration policy:

Do not mass-rewrite all existing Bash skills immediately.
For new high-value or high-complexity skills, prefer Node.js/TypeScript implementations.
Temporary Bash implementations (including the current Life360 flow) are acceptable only as stopgaps and must be queued for early migration once minimal Node skill SDK/runtime helpers are in place.

Agent-Friendly Command and Alias Layer

Because agents are primary customers, Clawperator should accept intuitive aliases that normalize to canonical actions.

Examples:

tap -> click
press -> click
long_press -> click with long-click params
wait_for -> wait_for_node
find -> wait_for_node
read -> read_text
snapshot -> snapshot_ui
sleep -> sleep
action: Primary entry point for single-step interactions.