The instrument panel for AI work.
You've put agents into production. Your dashboards already say they ran. The harder questions — are they working, are they getting better or quietly degrading, who proposed what and who approved it, where should the next dollar go — your dashboards don't answer.
AgentFlow is built around two ideas, in this order:
- A pipeline orchestrator that doesn't touch your agent code. Your agents are yours. AgentFlow launches them in their own containers, integrates with the humans and external systems that approve their work, and gives you fleet-wide visibility into which pipelines and gates are healthy. This is what AgentFlow is today.
- A measurement layer (beads) that ties cost, telemetry, and outcome to each run. Today's beads carry attribution and acceptance; richer telemetry (cost, tokens, models) is in flight as a per-run AI-gateway integration. Section 2 separates what's shipped from what's coming.
Read Section 1 to understand the orchestration layer (works today). Read Section 2 to understand beads — today's attribution + tomorrow's per-run cost.
What AgentFlow runs for you
AgentFlow's first job is to be a pipeline orchestrator. You define a sequence of work; AgentFlow executes it as a Step Functions state machine, manages the human and external decision points along the way, and records what was decided.
The four primitives:
llm_gate (Bedrock-backed) | external_gate (jira | github_label | webhook) — produces a Decision (approved | rejected | hold)App contains Pipelines. Pipelines are Step Functions state machines whose nodes are Steps. Each Step launches into a Container — that's where your code runs. Pipelines end at (or contain) Gates. AgentFlow does not enter the container.
What AgentFlow does NOT do
AgentFlow does not interfere with how you build your apps or your agents.
- No SDK to import inside your agent code
- No framework conventions inside the container
- No context to propagate, no headers to thread
- Your agents can be LangGraph, CrewAI, raw OpenAI SDK, custom Python, anything
- Step kinds match natural deployment shapes:
function/skillfor Python that runs in our pooled runners;workspacefor whatever Docker image you ship to Fargate;lambdafor an ARN you hand us;external_runnerfor runtimes outside our launch path that report back
AgentFlow is the launcher and the recorder. It launches the container, waits for the step to finish, routes the gate to the right human or system, records the decision, then advances the state machine. Your code stays yours.
How decisions get made
Pipelines incorporate gates — points where a decision happens (approve, reject, hold). Two kinds today:
llm_gate — LLM-as-decider
A Bedrock-backed Lambda evaluates the step's output against a decision_schema. Useful for deterministic rule-checks expressed in natural language ("does this PR contain a breaking change?"). Decision lands in DecisionV2.
yaml- id: cp1-rerun-decider kind: llm_gate on_event: cost-opt.cp1.proposed scope: ticket model: anthropic.claude-3-haiku-20240307-v1:0 decision_schema: { ref: cost_opt.schemas.RerunDecision } decided_by: cost-opt-cp1-rerun-decider:v1 on_rerun: { ... } on_park: { transition_to: Awaiting-CP1 }
external_gate — human or external system as decider
Three sources:
yaml# Jira: a state transition in a Jira issue resolves the gate - id: production-approval kind: external_gate source: jira config: actor_field: assignee state_to_decision: Approved: approved Rejected: rejected timeout_seconds: 86400 on_approved: { ... } on_rejected: { ... } # github_label: applying a label resolves the gate - id: pr-review kind: external_gate source: github_label config: actor_field: actor label_to_decision: lgtm: approved blocked: rejected # webhook: any system that POSTs to AgentFlow's resume endpoint - id: custom-approval kind: external_gate source: webhook config: { ... }
The gate's underlying mechanism is a Step Functions task token — the pipeline pauses at the gate state until the source system reports a decision via the configured resolver. The decision lands in DecisionV2 regardless of source. The pipeline cares about the decision, not where the human was when they made it.
The Inbox — a cross-app view of pending decisions
For human-resolved gates that don't have an external system attached, AgentFlow exposes an Inbox — a console view that aggregates tickets across all apps that are sitting in Awaiting-* or Pending-* stages. Backed by the list_pending_decisions MCP tool (queries each app's data layer for stage-pending tickets).
The Inbox is a view, not a gate kind. Operators use it to see "what needs me, across every product, right now."
How to use it
You can run AgentFlow as just an orchestrator. No bead enrichment needed. Five steps:
Register an app
MCP tool: registry.register_app (required: display_name; optional: app_id, description, admins, bu, product, tags). Creates a draft AppV2 row. The caller becomes the immutable creator (created_by); admins manage downstream.
Self-deploy your app's CFN stack
Each app deploys its own Step Functions, Lambda functions, IAM roles, etc. into its own AWS account. Standard SAM / CDK / whatever you use.
Finalize registration
MCP tool: registry.register_app_resources (passes the deployed ARNs back to AgentFlow: account_id, region, data_bucket, event_bus_arn, mcp_cross_account_role_arn). Status flips to registered.
Author pipelines as YAML
File location is your choice; canonical pattern is apps/<app_id>/pipelines/<name>.pipeline.yaml. The YAML is parsed by framework/backend/dsl/parser.py; required top-level fields are name and steps. Each pipeline becomes a Step Functions state machine on your next deploy and gets registered via registry.register_pipeline.
Run, resolve gates, watch
Trigger pipelines via triggers: (cron or EventBridge), via MCP execute_pipeline, or via webhook. Resolve external gates wherever your humans are. Watch the Inbox + run history in the console.
That's the orchestrator surface. No beads required. A reader who stops here understands why they'd use AgentFlow.
Closing
Today, AgentFlow is a pipeline orchestrator that integrates with your gates and records attribution against a bead axis. That alone replaces a lot of duct tape and gives you fleet-wide visibility into which agents are healthy.
In flight, the bead axis grows into per-run cost + LLM telemetry through a per-run AI gateway virtual key, with no instrumentation required inside the agent. Three tracked issues (319z, ls6a, aqs3) carry the work; this page will be updated as each ships.
The contract is stable in either world. Today's beads carry identity + attribution + acceptance; tomorrow's beads carry the same plus telemetry; the same bead_id and the same MCP tools read both. Adopt the orchestrator now; the measurement layer ships behind it without breaking anything you've already integrated.