Bring your own model

Any backend that can produce text can drive an AgentKit session: a local llama.cpp server, a research endpoint, a company-internal gateway. Conform to AgentProvider and everything else — the tool loop, guards, undo, context, limits — comes from the session.

Two requirements carry the contract:

  • capabilities — a ProviderCapabilities value declaring what your backend can actually do. Every flag changes session behavior, so declare honestly.
  • stream(_ request:) — takes a CompletionRequest, returns an AsyncThrowingStream<StreamEvent, Error>.

Two more have defaults: validateTools(_:) (returns no warnings) and inFlightTracker (nil). Most providers implement just the first two.

A minimal provider

The smallest conformance — an echo model with no tool calling:

struct EchoProvider: AgentProvider {
    let capabilities = ProviderCapabilities(
        executionModel: .appDriven,
        toolDiscovery: .dynamicPerRequest,
        supportsStreaming: true,
        supportsToolCalling: false,
        supportsVision: false,
        supportsStructuredOutput: false,
        supportsSamplingConfig: false,
        supportsParallelToolCalls: false,
        modelSelection: .none,
        managedConversation: false,
        requiresNetworking: false
    )

    func stream(_ request: CompletionRequest) -> AsyncThrowingStream<StreamEvent, Error> {
        var userText = ""
        if let lastUser = request.messages.last(where: { $0.role == .user }) {
            userText = lastUser.content.compactMap { item -> String? in
                if case .text(let text) = item { return text }
                return nil
            }.joined(separator: "\n")
        }

        return AsyncThrowingStream { continuation in
            continuation.yield(.textDelta("You said: \(userText)"))
            continuation.yield(.done)
            continuation.finish()
        }
    }
}

Providers must be Sendable — the session calls them across concurrency domains. A struct of immutable state, like this one, conforms automatically.

Wire it up with a spec; the closure builds your provider when the agent is created:

let agent = try runtime.makeAgent(
    provider: AgentProviderSpec { EchoProvider() },
    role: AgentRole(staticPersona: "Echo everything.")
)
try await agent.send("hello")
// agent.currentText == "You said: hello"

When your provider needs to know the agent's tools at construction (the on-device Apple provider does — it executes them itself), use the registry-aware form. The closure receives the agent's scoped registry — only the domains in the agent's scope, with executors resolvable per tool:

let spec = AgentProviderSpec(registryAware: { registry in
    MyProvider(tools: registry.allTools())
})

Declare capabilities honestly

ProviderCapabilities is not metadata — every flag changes how the session behaves. The load-bearing ones:

Flag What the session does with it
executionModel .appDriven — the session runs the tool loop: your provider emits tool-call requests and receives the results in the next request's messages. .providerDriven — your provider executes tools itself and reports lifecycle events.
toolDiscovery .dynamicPerRequest — the current active tool list rides every request; discovery starts from the built-in agentkit.* meta-tools. .eagerSessionTools — every scoped domain is active from the first request. .sessionRebuild(maxRebuilds:) — eager tools, re-primed mid-conversation when new domains activate.
supportsToolCalling false — the session routes tools through text: request.tools arrives empty, the system prompt carries the tool instructions, and tool calls are parsed back out of the model's reply.
requiresNetworking true — a configured pre-transmit filter runs on every outbound request before your provider sees it. See control what leaves the device.
managedConversation true — the session sends only messages your backend has not seen yet, for backends that hold conversation state server-side.
backendManagedSystemPrompt true — per-turn directives and context summaries ride a preamble inside the user message instead of growing the system prompt, for backends that own the prompt server-side.
supportsToolChoice false — a send with .required or .none fails typed before any request is built. The default is false: fail closed. See steer tool use.
supportsStructuredOutput falsegenerate() fails typed before your provider is called. See get data, not prose.
maxTools / contextWindow tool activation is capped at maxTools; history is compacted against contextWindow before every request.

What a request carries

CompletionRequest is everything one provider round trip needs:

Field Contents
systemPrompt persona plus per-turn directives and context (unless backendManagedSystemPrompt)
messages conversation history, already compacted against contextWindow
tools the active tool definitions — may include the built-in agentkit.* meta-tools; empty when supportsToolCalling is false
sampling optional temperature / topP / maxTokens — map what your backend supports, ignore the rest (and declare supportsSamplingConfig accordingly)
structuredOutput when set, constrain output to the schema and emit the JSON as text — arrives only if you declare supportsStructuredOutput; otherwise generate() fails typed and this is never populated
maxToolCallsPerTurn provider-driven only — the cap on tool executions inside your turn
toolChoice .auto / .required / .none; non-auto arrives only if you advertise supportsToolChoice

What a provider emits

The event vocabulary splits by execution model. App-driven providers emit:

  • .textDelta(String) — a chunk of assistant text
  • .toolCallComplete(ToolCall) — a tool the session should execute; the result arrives in your next request's messages
  • .toolCallPartial(id:name:argsDelta:) — optional incremental arguments; the session acts on toolCallComplete
  • .usage(UsageReport) — token accounting, surfaced as lastUsageReport
  • .done — the turn's final event

Provider-driven providers execute tools themselves and report the lifecycle: .toolCallStarted(ToolCall), then exactly one terminal event per call — .toolCallCompleted, .toolCallDenied, .toolCallFailed, .toolCallCancelled, or .toolCallConflict. Finishing the stream by throwing fails the turn; the session rolls it back.

The provider-driven event contract

The session builds durable conversation history from the lifecycle events a provider-driven provider emits, so it enforces their shape: every toolCallStarted reaches exactly one terminal event, call ids are unique within the turn, and everything lands before the stream ends. A violating turn fails with the typed AgentSessionError.providerEventContractViolation and persists no assistant or tool history beyond the already-appended user message — a broken provider is never laundered into valid-looking history.

If you are building a provider-driven provider, exercise it against an AgentSession with these traces: a valid single call and valid parallel calls (both must succeed and persist); a terminal event with no started, a duplicate started id, a duplicate terminal, and a started with no terminal (each must fail typed).

Validate tools

validateTools(_:) runs before every send. Return [ToolSchemaWarning] for non-fatal issues — they surface on agent.lastSchemaWarnings and the turn proceeds. Throw for schemas your backend cannot represent — the send fails as a schema validation error before the user message is recorded.

Next