Your own backend
BackendRouterProvider turns your server into the model. The SDK posts each
provider round trip to one endpoint you own; your server calls whatever LLM
it wants and streams events back. Tools still execute in the app — your
server never touches app state.
Use it when you already run a backend and want full control over the model call. If you would rather not build the server half, AgentKit Cloud ships it.
Point the SDK at your server
let agent = try runtime.makeAgent(
provider: .backendRouter(
endpoint: URL(string: "https://api.myapp.com/llm")!,
headers: ["Authorization": "Bearer \(userToken)"]
),
role: AgentRole(staticPersona: "You are a precise video-editing assistant.")
)
headers ride every request — put your own auth there. Requests go out over
an ephemeral session: no persisted cookies, credentials, or cache.
What your server receives
One JSON POST per provider round trip, carrying everything the model needs:
- the system prompt (the agent's persona),
- the full message history — user text, assistant text, tool calls, tool results, images,
- the active tool definitions (id, description, JSON-schema parameters),
- a tool-choice directive when the app forces or disables tool use, and sampling settings when configured.
Your server owns the model call: forward the turn to any provider, map its stream onto the events below, repeat per round trip. When the model calls a tool, the session executes it locally and the next POST carries the result in the message history.
What your server streams back
Respond with newline-delimited JSON: one event object per line, typed by a
type field.
{"type":"text.delta","delta":"I'll trim that clip."}
{"type":"tool.call","id":"tc_1","name":"timeline.trim_clip","arguments":{"clip_id":"abc","end":5.0}}
{"type":"done"}
| Event | Carries | Meaning |
|---|---|---|
text.delta |
delta |
a chunk of assistant text |
tool.partial |
id, args_delta, optional name |
streamed fragments of a tool call's arguments — emit them if your upstream streams them |
tool.call |
id, name, arguments |
a complete tool call; the session executes it locally |
usage |
input_tokens, output_tokens, model, provider, estimated_cost_usd |
token usage for the round trip |
error |
code, message |
the round trip failed; surfaces in the app as a typed error |
done |
— | the round trip is complete |
The rules the SDK holds your stream to:
- End every stream with
done. A response that ends without it fails the turn withBackendRouterError.streamTruncated— the SDK cannot tell a finished stream from a dead connection. Exactly onedoneper stream; anything after it is ignored. - The event vocabulary is closed. An unknown
typefails the turn — the two sides disagree about the format, and failing loud beats dropping events. errorends the turn. Codes the SDK recognizes map to typedBackendRouterErrorcases with retry guidance; any other code surfaces with the code intact. See when it fails.
Own the system prompt
The provider declares backendManagedSystemPrompt, which changes where live
state travels. Live context and per-turn directives
are merged into the first user message under a [CONTEXT] marker (the
user's own text follows under [USER]) instead of being appended to the
system prompt. The static persona still arrives in the request's
system-prompt field — use it, extend it, or replace it server-side.
This keeps live app state intact even when your server substitutes its own system prompt.
Declare what your backend honors
The SDK cannot know what your server actually implements, so the defaults are conservative: no vision, no tool choice, no structured output. Each is a real opt-in that rides the wire format, so when your backend honors one, declare it by passing your own capabilities.
let provider = BackendRouterProvider(
endpoint: URL(string: "https://api.myapp.com/llm")!,
capabilities: ProviderCapabilities(
executionModel: .appDriven,
toolDiscovery: .dynamicPerRequest,
supportsStreaming: true,
supportsToolCalling: true,
supportsVision: false,
supportsStructuredOutput: true, // your server constrains output to the requested schema
supportsSamplingConfig: true,
supportsParallelToolCalls: true,
modelSelection: .full,
managedConversation: false,
requiresNetworking: true,
backendManagedSystemPrompt: true,
supportsToolChoice: true // your server honors the tool-choice directive
)
)
A backend that declares supportsToolChoice receives the tool-choice
directive and is trusted to enforce it; against one that doesn't, a send
that forces or disables tool use fails typed instead of being silently
ignored — see steer tool use.
When you declare supportsStructuredOutput, a generate() call sends the
JSON schema on the wire under a structuredOutput field and the SDK
validates the reply against it in-process; map the strict hint to your
upstream's constrained-decoding mode if it has one. The exact envelope shape
is in the SDK's contract notes (docs/contract/freeform-structured-output.md).
Against a backend that doesn't declare it, generate() fails typed with
StructuredOutputError.unsupported before any request — see
structured output.
Redirects fail closed
A redirect whose target differs from the original origin in scheme, host,
or port is never followed: the request fails with
BackendRouterError.crossOriginRedirectBlocked and nothing — no headers, no
body — reaches the foreign origin. Same-origin redirects are followed with
the full header set re-attached. Keep your endpoint's redirects within one
origin.