On-device with Apple

.appleFoundationModels() runs your agent on the system language model that ships with Apple Intelligence. No API key, no network — inference happens on the device, and conversation content stays local. The same AgentSession API applies: you change the provider line, not your domains, guards, or UI.

let agent = try runtime.makeAgent(
    provider: .appleFoundationModels(),
    role: AgentRole(staticPersona: "You are a concise assistant.")
)

The model is small (~3B parameters) and the framework enforces real limits. This page is the full tour: what works, where the boundaries sit, and what to design around. For the cross-provider comparison, see the capability matrix.

Check availability first

The provider needs macOS 26, iOS 26, or visionOS 26, and a device with Apple Intelligence enabled and the model downloaded. Two gates, in order:

#available(macOS 26.0, iOS 26.0, visionOS 26.0, *) at the construction site — the symbols do not exist below that.
AppleFoundationModelsProvider.isAvailable at runtime — eligible device, Apple Intelligence on, model ready.

let provider: AgentProviderSpec
if #available(macOS 26.0, iOS 26.0, visionOS 26.0, *),
   AppleFoundationModelsProvider.isAvailable {
    provider = .appleFoundationModels()
} else {
    provider = .anthropic(apiKey: key)   // or any other fallback
}

When you need to tell the user why the model is missing, switch on the reason-bearing form:

switch AppleFoundationModelsProvider.availability {
case .available:
    break
case .unavailable(.appleIntelligenceNotEnabled):
    // point the user at Settings
case .unavailable(.modelNotReady):
    // still downloading — retry later
case .unavailable:
    // not eligible on this device
}

Fall back automatically

The availability dance above is a pattern you can hand-write, or hand to the SDK. .appleFoundationModels(fallingBackTo:fallbackIdentity:) runs the same runtime isAvailable check at construction: it selects the on-device model when the model is ready, or your fallback provider when Apple Intelligence is off, the model is still downloading, or the device is not eligible.

let agent = try runtime.makeAgent(
    provider: .appleFoundationModels(
        fallingBackTo: .anthropic(apiKey: key),
        fallbackIdentity: ProviderIdentity(id: "anthropic", displayName: "Claude")
    ),
    role: AgentRole(staticPersona: "You are a concise assistant.")
)

This replaces the runtime isAvailable branch, not the compile-time gate. The factory is still a macOS 26 / iOS 26 / visionOS 26 symbol, so a target that also supports older systems wraps the call in #available and supplies a provider for the older path. A target whose deployment floor is already 26 calls it unguarded, as shown above.

The choice is made once, at construction, and is fixed for the life of the session. That stickiness is deliberate. A session reads the provider's execution model when it starts and locks one of two turn loops: on-device runs tools inside Apple's session, while cloud streams them back through the app-driven loop, and the two are not interchangeable mid-session. Resolving the selection before the session starts keeps the right loop in place for every turn. Only the selected side is built, so a cloud fallback you never reach constructs no client and costs nothing.

Know which one ran

The selection changes where conversation content goes, so it is worth surfacing. Pass onSelection to learn the outcome at construction, and read activeIdentity on the resulting provider:

provider: .appleFoundationModels(
    fallingBackTo: .anthropic(apiKey: key),
    fallbackIdentity: ProviderIdentity(id: "anthropic", displayName: "Claude"),
    onSelection: { identity, reason in
        // reason: .primaryAvailable (on-device) or .primaryUnavailable (fell back)
        analytics.log("inference provider", identity.id)
    }
)

Privacy follows from the selection. When the fallback is a cloud provider, selecting it moves inference off the device and onto the network. The fallback is opt-in by construction, because you pass it explicitly, and the outcome is reported through onSelection and activeIdentity, so a host can reflect it in the UI or gate it behind consent. See control what leaves the device.

What this handles, and what it does not

This is availability-based selection, resolved once at the start: the on-device model is missing, gated by OS, Apple Intelligence is off, or the model is not yet downloaded. A transient failure from the model after selection, such as a .generation blip, surfaces as a typed AppleModelError. It does not silently switch providers mid-session, which would mean swapping turn loops underneath an in-flight conversation. Handle those typed cases with a fallback arm, as the failure table below describes. Automatic per-turn fallback on transient on-device errors is tracked for a later release.

Tools run inside Apple's session

This provider is provider-driven: the model invokes your tool executors directly inside Apple's own session instead of streaming tool requests back through the app-driven loop. From your code's point of view, nothing moves:

Tool activity still streams through the same observable surface — activeToolCalls fills and drains, currentText accumulates.
Guards, confirmation, and undo run around every call — but they belong at the provider seam, not the session. Pass them to .appleFoundationModels(guardPipeline:confirmationHandler:undoTransaction:); tools execute inside the provider, so that is where the checks live. Passing guards or a confirmationHandler to makeAgent instead throws at construction rather than being silently ignored.
Tool calls and results still land in conversation history, so a conversation started on-device resumes against a cloud provider with the tool exchanges intact. That history then leaves the device — scope sensitive tools accordingly.

Budget the window

The on-device context window is 4,096 tokens, and everything shares it: the persona, every declared tool schema, conversation history, and the reply itself. The session compacts history against the window automatically, but a prompt-heavy configuration can still overflow — that surfaces as the typed AppleModelError.contextWindowExceeded, never as silent truncation.

Three habits keep on-device turns inside budget:

Terse tool results. A few hundred bytes per result. A dozen 1.5 KB results can overflow the window before the per-turn tool cap binds.
A stable persona. The provider keeps Apple's native session alive across turns when the system prompt and tool set do not change. A persona that varies per turn (dynamic directives, always-refreshing context sources) forces a full re-prime every turn — correct output, full prefill cost each time. Prefer .cached context-source policies.
Few tools. See the next section.

Schema weight is the real tool ceiling

maxTools advertises 20, but that is a ceiling on count. What actually binds is how many tokens your schemas cost, because every declared tool's name, description, and full parameter schema is loaded into the session up front:

Tool set	Count	On the 4,096-token window
Small schemas (1–2 string properties, one-line descriptions)	12	fits
Realistic schemas (3–6 properties, a nested object, an enum)	14	fits
Wide schemas with long descriptions	15	overflows (~8,400 tokens)

A production-shaped tool costs roughly 150–300 tokens; a documentation-heavy tool with nested objects and enums can cost over 500. Budget the tool set against the window, and prefer staged discovery over declaring everything eagerly.

Two schema shapes are rejected before any model work with a typed AppleProviderError.unsupportedStructuredSchema naming the tool and the offending path: non-object argument roots, and dynamic keys (additionalProperties: true) anywhere in the tree. Cloud providers serialize those shapes fine; on-device they would silently lose data, so the provider refuses loudly instead.

Images degrade to text

On macOS 26, iOS 26, and visionOS 26 the on-device model takes no image input: supportsVision is false, and images you pass to send(_:images:) degrade to short text descriptors — the turn still works, the model sees a description instead of pixels. Built against the OS 27 SDK and running on OS 27, the provider asks the model what it supports at construction: when the model reports vision, supportsVision is true and image bytes attach natively to the prompt. See send images.

Structured output is native

generate(from:schema:) and its typed overloads run through Apple's guided generation: the model is constrained to your schema while it decodes, not validated after the fact. Structured turns run without tools, and they see real history — after an on-device tool turn, the model can recall actual tool results into the structured answer. See get data, not prose.

Tool choice needs OS 27

.required and .none map to the framework's native tool-calling modes on OS 27 and later. Below that, the provider reports no tool-choice support and a non-.auto send fails typed before anything reaches the model. See steer tool use.

Reasoning budget needs OS 27

reasoningEffort: on send and generate asks the model to think harder before answering. The cross-provider values map onto the framework's native reasoning levels: .low → light, .medium → moderate, .high → deep.

try await agent.send("Plan the refactor end to end.", reasoningEffort: .high)

On OS 27 and later the budget rides every request of the turn, for the on-device model and Private Cloud Compute alike. Below that — or on any other provider — the capability reports false and a turn carrying an effort fails typed before anything reaches the model, on BOTH paths: send and generate each throw AgentSessionError.reasoningEffortUnsupported (note for generate callers: that is a session error, not a StructuredOutputError). The SDK never silently drops a budget you asked for — check capabilities.supportsReasoningEffort before passing one. Leaving it unset (the default) changes nothing on any OS.

Private Cloud Compute

On OS 27+ the same provider class can run against Apple's Private Cloud Compute model instead of the on-device one. Be clear about what changes: this configuration is networked — conversation content leaves the device on every turn — and requests count against an Apple-managed quota.

if #available(macOS 27.0, iOS 27.0, visionOS 27.0, *) {
    let agent = try await runtime.makeAgentAsync(
        provider: .applePrivateCloudCompute(),
        role: AgentRole(staticPersona: "You are a concise assistant.")
    )
}

What to know:

The factory is async and fails fast. It fetches the service's context window at construction; if the service cannot be reached it throws AppleModelError.networkFailure or .serviceUnavailable instead of returning a provider that would fail every turn — catch and fall back to the on-device construction. Check AppleFoundationModelsProvider.privateCloudComputeAvailability before paying the round trip.
The window is whatever the service reports at construction, not 4,096.
Quota surfaces typed. Exhaustion and rate limiting both arrive as AppleModelError.quotaExhausted(message:resetDate:) — back off, and schedule the retry from resetDate when present.
The privacy gate engages. A Private Cloud Compute provider reports requiresNetworking, so a configured pre-transmit filter runs on every outbound request, exactly as it does for cloud providers. See control what leaves the device.
Construct through the async spec. AgentProviderSpec.applePrivateCloudCompute() is async-only, so it builds via runtime.makeAgentAsync(...) (try await). This hands the runtime's scoped registry to the PCC factory, so tools you registered on the runtime are visible to the agent. Building it on the synchronous makeAgent path throws AgentSessionError.invalidConfiguration.
Or hand the factory executors directly. Outside the runtime, call privateCloudCompute(registry:) or privateCloudCompute(executors:); the provider executes tools itself and needs them at construction. The model is fixed per instance, so build one provider per model.
Pair a deep reasoning budget with it. More reasoning capability (plus a larger window) is what this tier is FOR — Apple's own samples drive Private Cloud Compute at the deepest level. reasoningEffort: .high on send or generate maps straight onto it; see reasoning budget needs OS 27.

When the model fails

All model failures surface as typed AppleModelError cases:

Case	Meaning	What to do
`.refusal`	the model declined (safety / policy)	rephrase; surface the explanation
`.guardrailViolation`	input or output tripped a content guardrail	show the user; do not blind-retry
`.contextWindowExceeded`	the prompt did not fit the window	trim tools, persona, or tool results
`.unsupportedLanguageOrLocale`	content in a language the model does not handle	tell the user
`.generation`	any other generation failure	retryable — the model occasionally fails transiently
`.inconsistentSnapshotStream`	the provider received an inconsistent stream of cumulative updates from the OS model	treat as a transient model failure — retry the turn
`.networkFailure`	Private Cloud Compute unreachable	retry when connectivity returns
`.quotaExhausted`	Private Cloud Compute quota or rate limit	back off until `resetDate`
`.serviceUnavailable`	the Private Cloud Compute service is down	fall back to on-device

The on-device model occasionally refuses or fails for reasons your code does not control — treat .refusal and .generation as conditions to handle, not bugs to fix. A failed turn rolls back completely: no assistant message persists, tool effects revert, and any partial streamed text should be discarded. Keep a fallback arm when switching over the enum.

Configuration errors surface as AppleProviderError, thrown before any model work:

Case	Meaning	What to do
`.missingExecutors`	the agent declares a tool the provider has no executor for	hand the provider every executor its tools need
`.noUserMessage`	a turn reached the model with no user content	send user text — an empty turn has nothing to answer
`.unsupportedStructuredSchema`	the schema cannot be represented as guided generation	reshape it — object root, no dynamic keys
`.invalidToolCallBudget`	a negative tool-call budget	fix the limit — zero means no tool calls; negative is an error
`.undecodableImageAttachment`	a data-backed image could not be decoded for native attachment	check the bytes and media type before sending

Choose a provider — when on-device is the right call.
Capability matrix — every provider, every flag.
Control what leaves the device — the egress hook Private Cloud Compute engages.