On-device with Apple

.appleFoundationModels() runs your agent on the system language model that ships with Apple Intelligence. No API key, no network — inference happens on the device, and conversation content stays local. The same AgentSession API applies: you change the provider line, not your domains, guards, or UI.

let agent = try runtime.makeAgent(
    provider: .appleFoundationModels(),
    role: AgentRole(staticPersona: "You are a concise assistant.")
)

The model is small (~3B parameters) and the framework enforces real limits. This page is the full tour: what works, where the boundaries sit, and what to design around. For the cross-provider comparison, see the capability matrix.

Check availability first

The provider needs macOS 26, iOS 26, or visionOS 26, and a device with Apple Intelligence enabled and the model downloaded. Two gates, in order:

  • #available(macOS 26.0, iOS 26.0, visionOS 26.0, *) at the construction site — the symbols do not exist below that.
  • AppleFoundationModelsProvider.isAvailable at runtime — eligible device, Apple Intelligence on, model ready.
let provider: AgentProviderSpec
if #available(macOS 26.0, iOS 26.0, visionOS 26.0, *),
   AppleFoundationModelsProvider.isAvailable {
    provider = .appleFoundationModels()
} else {
    provider = .anthropic(apiKey: key)   // or any other fallback
}

When you need to tell the user why the model is missing, switch on the reason-bearing form:

switch AppleFoundationModelsProvider.availability {
case .available:
    break
case .unavailable(.appleIntelligenceNotEnabled):
    // point the user at Settings
case .unavailable(.modelNotReady):
    // still downloading — retry later
case .unavailable:
    // not eligible on this device
}

Tools run inside Apple's session

This provider is provider-driven: the model invokes your tool executors directly inside Apple's own session instead of streaming tool requests back through the app-driven loop. From your code's point of view, nothing moves:

  • Tool activity still streams through the same observable surface — activeToolCalls fills and drains, currentText accumulates.
  • Guards, confirmation, and undo run around every call — but they belong at the provider seam, not the session. Pass them to .appleFoundationModels(guardPipeline:confirmationHandler:undoTransaction:); tools execute inside the provider, so that is where the checks live. Passing guards or a confirmationHandler to makeAgent instead throws at construction rather than being silently ignored.
  • Tool calls and results still land in conversation history, so a conversation started on-device resumes against a cloud provider with the tool exchanges intact. That history then leaves the device — scope sensitive tools accordingly.

Budget the window

The on-device context window is 4,096 tokens, and everything shares it: the persona, every declared tool schema, conversation history, and the reply itself. The session compacts history against the window automatically, but a prompt-heavy configuration can still overflow — that surfaces as the typed AppleModelError.contextWindowExceeded, never as silent truncation.

Three habits keep on-device turns inside budget:

  • Terse tool results. A few hundred bytes per result. A dozen 1.5 KB results can overflow the window before the per-turn tool cap binds.
  • A stable persona. The provider keeps Apple's native session alive across turns when the system prompt and tool set do not change. A persona that varies per turn (dynamic directives, always-refreshing context sources) forces a full re-prime every turn — correct output, full prefill cost each time. Prefer .cached context-source policies.
  • Few tools. See the next section.

Schema weight is the real tool ceiling

maxTools advertises 20, but that is a ceiling on count. What actually binds is how many tokens your schemas cost, because every declared tool's name, description, and full parameter schema is loaded into the session up front:

Tool set Count On the 4,096-token window
Small schemas (1–2 string properties, one-line descriptions) 12 fits
Realistic schemas (3–6 properties, a nested object, an enum) 14 fits
Wide schemas with long descriptions 15 overflows (~8,400 tokens)

A production-shaped tool costs roughly 150–300 tokens; a documentation-heavy tool with nested objects and enums can cost over 500. Budget the tool set against the window, and prefer staged discovery over declaring everything eagerly.

Two schema shapes are rejected before any model work with a typed AppleProviderError.unsupportedStructuredSchema naming the tool and the offending path: non-object argument roots, and dynamic keys (additionalProperties: true) anywhere in the tree. Cloud providers serialize those shapes fine; on-device they would silently lose data, so the provider refuses loudly instead.

Images degrade to text

On macOS 26, iOS 26, and visionOS 26 the on-device model takes no image input: supportsVision is false, and images you pass to send(_:images:) degrade to short text descriptors — the turn still works, the model sees a description instead of pixels. Built against the OS 27 SDK and running on OS 27, the provider asks the model what it supports at construction: when the model reports vision, supportsVision is true and image bytes attach natively to the prompt. See send images.

Structured output is native

generate(from:schema:) and its typed overloads run through Apple's guided generation: the model is constrained to your schema while it decodes, not validated after the fact. Structured turns run without tools, and they see real history — after an on-device tool turn, the model can recall actual tool results into the structured answer. See get data, not prose.

Tool choice needs OS 27

.required and .none map to the framework's native tool-calling modes on OS 27 and later. Below that, the provider reports no tool-choice support and a non-.auto send fails typed before anything reaches the model. See steer tool use.

Private Cloud Compute

On OS 27+ the same provider class can run against Apple's Private Cloud Compute model instead of the on-device one. Be clear about what changes: this configuration is networked — conversation content leaves the device on every turn — and requests count against an Apple-managed quota.

if #available(macOS 27.0, iOS 27.0, visionOS 27.0, *) {
    let provider = try await AppleFoundationModelsProvider.privateCloudCompute()
    let agent = try runtime.makeAgent(
        provider: AgentProviderSpec { provider },
        role: AgentRole(staticPersona: "You are a concise assistant.")
    )
}

What to know:

  • The factory is async and fails fast. It fetches the service's context window at construction; if the service cannot be reached it throws AppleModelError.networkFailure or .serviceUnavailable instead of returning a provider that would fail every turn — catch and fall back to the on-device construction. Check AppleFoundationModelsProvider.privateCloudComputeAvailability before paying the round trip.
  • The window is whatever the service reports at construction, not 4,096.
  • Quota surfaces typed. Exhaustion and rate limiting both arrive as AppleModelError.quotaExhausted(message:resetDate:) — back off, and schedule the retry from resetDate when present.
  • The privacy gate engages. A Private Cloud Compute provider reports requiresNetworking, so a configured pre-transmit filter runs on every outbound request, exactly as it does for cloud providers. See control what leaves the device.
  • Tool-bearing agents hand the factory their executors — use privateCloudCompute(registry:) or privateCloudCompute(executors:) — because the provider executes tools itself and needs them at construction. The model is fixed per instance: build one provider per model.

When the model fails

All model failures surface as typed AppleModelError cases:

Case Meaning What to do
.refusal the model declined (safety / policy) rephrase; surface the explanation
.guardrailViolation input or output tripped a content guardrail show the user; do not blind-retry
.contextWindowExceeded the prompt did not fit the window trim tools, persona, or tool results
.unsupportedLanguageOrLocale content in a language the model does not handle tell the user
.generation any other generation failure retryable — the model occasionally fails transiently
.inconsistentSnapshotStream the provider received an inconsistent stream of cumulative updates from the OS model treat as a transient model failure — retry the turn
.networkFailure Private Cloud Compute unreachable retry when connectivity returns
.quotaExhausted Private Cloud Compute quota or rate limit back off until resetDate
.serviceUnavailable the Private Cloud Compute service is down fall back to on-device

The on-device model occasionally refuses or fails for reasons your code does not control — treat .refusal and .generation as conditions to handle, not bugs to fix. A failed turn rolls back completely: no assistant message persists, tool effects revert, and any partial streamed text should be discarded. Keep a fallback arm when switching over the enum.

Configuration errors surface as AppleProviderError, thrown before any model work:

Case Meaning What to do
.missingExecutors the agent declares a tool the provider has no executor for hand the provider every executor its tools need
.noUserMessage a turn reached the model with no user content send user text — an empty turn has nothing to answer
.unsupportedStructuredSchema the schema cannot be represented as guided generation reshape it — object root, no dynamic keys
.invalidToolCallBudget a negative tool-call budget fix the limit — zero means no tool calls; negative is an error
.undecodableImageAttachment a data-backed image could not be decoded for native attachment check the bytes and media type before sending

Next