Navigation Patterns for Agents
This page collects the practical operating patterns that help a cold-start agent succeed on unknown Android screens.
Use it for:
- scrolling through unfamiliar layouts
- handling overlays and dialogs safely
- dealing with OEM-specific Settings differences
- minimizing wasted round trips
Default loop for unknown apps
Use a tight observe-decide-act loop:
snapshot_ui- choose one action or one short action sequence
- execute
snapshot_uiagain- reassess before continuing
This is slower than long optimistic macros, but it is the safest default when you have not yet learned the app's structure.
Prefer explicit containers for scrolling
When a screen has more than one scrollable node, auto-detection can pick an outer wrapper instead of the content list you really want.
Recommended pattern:
- take a snapshot
- identify the true content list by
resource-id - pass that node as
params.container
This is especially important for:
- Settings screens
- nested recycler views
- drawers and tabbed layouts
- OEM-modified system apps
Treat targeted scrolls as "reveal, then verify"
Even with TARGET_FOUND support, a scrolling action should not be your final
source of truth about the whole screen state.
Good pattern:
scroll_untilwith explicitcontainerandtarget- capture a fresh
snapshot_ui - confirm the target is visible and still the intended node
- click or read only after that confirmation when the workflow is sensitive
This matters because some Android layouts expose clipped descendants in raw XML near the viewport edge.
Use snapshot metadata as a hint, not a verdict
Successful snapshots may include:
foreground_packagehas_overlayoverlay_packagewindow_count
Interpret them like this:
has_overlay: "true"means another meaningful accessibility window was detected- it does not guarantee the screen is blocked
window_count > 1alone is normal on some Android builds
When overlay metadata appears, the safest next move is usually another observation step rather than an immediate destructive action.
Overlay and dialog handling
Unexpected dialogs are one of the biggest sources of agent confusion.
Recommended order of operations:
- confirm whether the dialog is the current foreground problem
- prefer an explicit dismiss action only if the button meaning is clear
- avoid reflexively sending
press_key: backunless you are willing to lose current navigation context - if the underlying content is still actionable and you know the correct scroll container, continue carefully with an explicit container selector
Good dialog selectors often use:
textEqualstextContainscontentDescEquals
because many system dialogs have weak or missing resource-id values.
OEM variation strategy
Assume system apps differ across vendors.
Examples:
- stock Android may place Android version directly in
About phone - Samsung may place it inside
Software information
Recommended pattern:
- identify the current package with
foreground_packageor snapshot XML - branch on visible labels, not assumptions about one vendor layout
- keep selectors descriptive and local to the current screen
- preserve fallback paths when a known OEM split exists
Stable selector order
When choosing a selector for navigation, prefer:
resourceIdcontentDescEqualstextEqualscontentDescContainstextContains
Use role as a semantic narrowing field, not the primary anchor, unless the
screen gives you no stronger handle.
When to split across multiple executions
Split the workflow when:
- the next decision depends on newly revealed UI
- a dialog or overlay may appear
- the screen is known to re-layout heavily after clicks
- you are exploring a new app for the first time
Combine actions into one execution only when the path is already known and the steps are tightly coupled.
Quick recovery heuristics
If a step fails:
NODE_NOT_FOUND: re-observe before changing selectorsCONTAINER_NOT_FOUND: inspect the scrollable nodes and passcontainerexplicitlySNAPSHOT_EXTRACTION_FAILED: retry snapshot briefly, then check compatibility and doctor outputEXECUTION_CONFLICT_IN_FLIGHT: wait and serialize work per device