Your own backend

BackendRouterProvider turns your server into the model. The SDK posts each provider round trip to one endpoint you own; your server calls whatever LLM it wants and streams events back. Tools still execute in the app — your server never touches app state.

Use it when you already run a backend and want full control over the model call. If you would rather not build the server half, AgentKit Cloud ships it.

Point the SDK at your server

let agent = try runtime.makeAgent(
    provider: .backendRouter(
        endpoint: URL(string: "https://api.myapp.com/llm")!,
        headers: ["Authorization": "Bearer \(userToken)"]
    ),
    role: AgentRole(staticPersona: "You are a precise video-editing assistant.")
)

headers ride every request — put your own auth there. Requests go out over an ephemeral session: no persisted cookies, credentials, or cache.

What your server receives

One JSON POST per provider round trip, carrying everything the model needs:

the system prompt (the agent's persona),
the full message history — user text, assistant text, tool calls, tool results, images,
the active tool definitions (id, description, JSON-schema parameters),
a tool-choice directive when the app forces or disables tool use, and sampling settings when configured.

Your server owns the model call: forward the turn to any provider, map its stream onto the events below, repeat per round trip. When the model calls a tool, the session executes it locally and the next POST carries the result in the message history.

What your server streams back

Respond with newline-delimited JSON: one event object per line, typed by a type field.

{"type":"text.delta","delta":"I'll trim that clip."}
{"type":"tool.call","id":"tc_1","name":"timeline.trim_clip","arguments":{"clip_id":"abc","end":5.0}}
{"type":"done"}

Event	Carries	Meaning
`text.delta`	`delta`	a chunk of assistant text
`tool.partial`	`id`, `args_delta`, optional `name`	streamed fragments of a tool call's arguments — emit them if your upstream streams them
`tool.call`	`id`, `name`, `arguments`	a complete tool call; the session executes it locally
`usage`	`input_tokens`, `output_tokens`, `model`, `provider`, `estimated_cost_usd`	token usage for the round trip
`error`	`code`, `message`	the round trip failed; surfaces in the app as a typed error
`done`	—	the round trip is complete

The rules the SDK holds your stream to:

End every stream with done. A response that ends without it fails the turn with BackendRouterError.streamTruncated — the SDK cannot tell a finished stream from a dead connection. Exactly one done per stream; anything after it is ignored.
The event vocabulary is closed. An unknown type fails the turn — the two sides disagree about the format, and failing loud beats dropping events.
error ends the turn. Codes the SDK recognizes map to typed BackendRouterError cases with retry guidance; any other code surfaces with the code intact. See when it fails.

Own the system prompt

The provider declares backendManagedSystemPrompt, which changes where live state travels. Live context and per-turn directives are merged into the first user message under a [CONTEXT] marker (the user's own text follows under [USER]) instead of being appended to the system prompt. The static persona still arrives in the request's system-prompt field — use it, extend it, or replace it server-side.

This keeps live app state intact even when your server substitutes its own system prompt.

Declare what your backend honors

The SDK cannot know what your server actually implements, so the defaults are conservative: no vision, no tool choice, no structured output. Each is a real opt-in that rides the wire format, so when your backend honors one, declare it by passing your own capabilities.

let provider = BackendRouterProvider(
    endpoint: URL(string: "https://api.myapp.com/llm")!,
    capabilities: ProviderCapabilities(
        executionModel: .appDriven,
        toolDiscovery: .dynamicPerRequest,
        supportsStreaming: true,
        supportsToolCalling: true,
        supportsVision: false,
        supportsStructuredOutput: true,   // your server constrains output to the requested schema
        supportsSamplingConfig: true,
        supportsParallelToolCalls: true,
        modelSelection: .full,
        managedConversation: false,
        requiresNetworking: true,
        backendManagedSystemPrompt: true,
        supportsToolChoice: true    // your server honors the tool-choice directive
    )
)

A backend that declares supportsToolChoice receives the tool-choice directive and is trusted to enforce it; against one that doesn't, a send that forces or disables tool use fails typed instead of being silently ignored — see steer tool use.

When you declare supportsStructuredOutput, a generate() call sends the JSON schema on the wire under a structuredOutput field and the SDK validates the reply against it in-process; map the strict hint to your upstream's constrained-decoding mode if it has one. The exact envelope shape is in the SDK's contract notes (docs/contract/freeform-structured-output.md). The cloud-profile path is separate: it carries structured output under a response_format field constrained to a portable cross-provider dialect, documented in docs/contract/cloud-structured-output.md. Against a backend that doesn't declare it, generate() fails typed with StructuredOutputError.unsupported before any request — see structured output.

Redirects fail closed

A redirect whose target differs from the original origin in scheme, host, or port is never followed: the request fails with BackendRouterError.crossOriginRedirectBlocked and nothing — no headers, no body — reaches the foreign origin. Same-origin redirects are followed with the full header set re-attached. Keep your endpoint's redirects within one origin.