On-device with Apple
.appleFoundationModels() runs your agent on the system language model that
ships with Apple Intelligence. No API key, no network — inference happens on
the device, and conversation content stays local. The same AgentSession API
applies: you change the provider line, not your domains, guards, or UI.
let agent = try runtime.makeAgent(
provider: .appleFoundationModels(),
role: AgentRole(staticPersona: "You are a concise assistant.")
)
The model is small (~3B parameters) and the framework enforces real limits. This page is the full tour: what works, where the boundaries sit, and what to design around. For the cross-provider comparison, see the capability matrix.
Check availability first
The provider needs macOS 26, iOS 26, or visionOS 26, and a device with Apple Intelligence enabled and the model downloaded. Two gates, in order:
#available(macOS 26.0, iOS 26.0, visionOS 26.0, *)at the construction site — the symbols do not exist below that.AppleFoundationModelsProvider.isAvailableat runtime — eligible device, Apple Intelligence on, model ready.
let provider: AgentProviderSpec
if #available(macOS 26.0, iOS 26.0, visionOS 26.0, *),
AppleFoundationModelsProvider.isAvailable {
provider = .appleFoundationModels()
} else {
provider = .anthropic(apiKey: key) // or any other fallback
}
When you need to tell the user why the model is missing, switch on the reason-bearing form:
switch AppleFoundationModelsProvider.availability {
case .available:
break
case .unavailable(.appleIntelligenceNotEnabled):
// point the user at Settings
case .unavailable(.modelNotReady):
// still downloading — retry later
case .unavailable:
// not eligible on this device
}
Tools run inside Apple's session
This provider is provider-driven: the model invokes your tool executors directly inside Apple's own session instead of streaming tool requests back through the app-driven loop. From your code's point of view, nothing moves:
- Tool activity still streams through the same observable surface —
activeToolCallsfills and drains,currentTextaccumulates. - Guards, confirmation, and undo run around every call — but they belong at the
provider seam, not the session. Pass them to
.appleFoundationModels(guardPipeline:confirmationHandler:undoTransaction:); tools execute inside the provider, so that is where the checks live. Passingguardsor aconfirmationHandlertomakeAgentinstead throws at construction rather than being silently ignored. - Tool calls and results still land in conversation history, so a conversation started on-device resumes against a cloud provider with the tool exchanges intact. That history then leaves the device — scope sensitive tools accordingly.
Budget the window
The on-device context window is 4,096 tokens, and everything shares it: the
persona, every declared tool schema, conversation history, and the reply
itself. The session compacts history against the window automatically, but a
prompt-heavy configuration can still overflow — that surfaces as the typed
AppleModelError.contextWindowExceeded, never as silent truncation.
Three habits keep on-device turns inside budget:
- Terse tool results. A few hundred bytes per result. A dozen 1.5 KB results can overflow the window before the per-turn tool cap binds.
- A stable persona. The provider keeps Apple's native session alive
across turns when the system prompt and tool set do not change. A persona
that varies per turn (dynamic directives, always-refreshing context
sources) forces a full re-prime every turn — correct output, full prefill
cost each time. Prefer
.cachedcontext-source policies. - Few tools. See the next section.
Schema weight is the real tool ceiling
maxTools advertises 20, but that is a ceiling on count. What actually binds
is how many tokens your schemas cost, because every declared tool's name,
description, and full parameter schema is loaded into the session up front:
| Tool set | Count | On the 4,096-token window |
|---|---|---|
| Small schemas (1–2 string properties, one-line descriptions) | 12 | fits |
| Realistic schemas (3–6 properties, a nested object, an enum) | 14 | fits |
| Wide schemas with long descriptions | 15 | overflows (~8,400 tokens) |
A production-shaped tool costs roughly 150–300 tokens; a documentation-heavy tool with nested objects and enums can cost over 500. Budget the tool set against the window, and prefer staged discovery over declaring everything eagerly.
Two schema shapes are rejected before any model work with a typed
AppleProviderError.unsupportedStructuredSchema naming the tool and the
offending path: non-object argument roots, and dynamic keys
(additionalProperties: true) anywhere in the tree. Cloud providers
serialize those shapes fine; on-device they would silently lose data, so the
provider refuses loudly instead.
Images degrade to text
On macOS 26, iOS 26, and visionOS 26 the on-device model takes no image
input: supportsVision is false, and images you pass to send(_:images:)
degrade to short text descriptors — the turn still works, the model sees a
description instead of pixels. Built against the OS 27 SDK and running on
OS 27, the provider asks the model what it supports at construction: when the
model reports vision, supportsVision is true and image bytes attach
natively to the prompt. See send images.
Structured output is native
generate(from:schema:) and its typed overloads run through Apple's guided
generation: the model is constrained to your schema while it decodes, not
validated after the fact. Structured turns run without tools, and they see
real history — after an on-device tool turn, the model can recall actual tool
results into the structured answer. See
get data, not prose.
Tool choice needs OS 27
.required and .none map to the framework's native tool-calling modes on
OS 27 and later. Below that, the provider reports no tool-choice support and
a non-.auto send fails typed before anything reaches the model. See
steer tool use.
Private Cloud Compute
On OS 27+ the same provider class can run against Apple's Private Cloud Compute model instead of the on-device one. Be clear about what changes: this configuration is networked — conversation content leaves the device on every turn — and requests count against an Apple-managed quota.
if #available(macOS 27.0, iOS 27.0, visionOS 27.0, *) {
let provider = try await AppleFoundationModelsProvider.privateCloudCompute()
let agent = try runtime.makeAgent(
provider: AgentProviderSpec { provider },
role: AgentRole(staticPersona: "You are a concise assistant.")
)
}
What to know:
- The factory is async and fails fast. It fetches the service's context
window at construction; if the service cannot be reached it throws
AppleModelError.networkFailureor.serviceUnavailableinstead of returning a provider that would fail every turn — catch and fall back to the on-device construction. CheckAppleFoundationModelsProvider.privateCloudComputeAvailabilitybefore paying the round trip. - The window is whatever the service reports at construction, not 4,096.
- Quota surfaces typed. Exhaustion and rate limiting both arrive as
AppleModelError.quotaExhausted(message:resetDate:)— back off, and schedule the retry fromresetDatewhen present. - The privacy gate engages. A Private Cloud Compute provider reports
requiresNetworking, so a configured pre-transmit filter runs on every outbound request, exactly as it does for cloud providers. See control what leaves the device. - Tool-bearing agents hand the factory their executors — use
privateCloudCompute(registry:)orprivateCloudCompute(executors:)— because the provider executes tools itself and needs them at construction. The model is fixed per instance: build one provider per model.
When the model fails
All model failures surface as typed AppleModelError cases:
| Case | Meaning | What to do |
|---|---|---|
.refusal |
the model declined (safety / policy) | rephrase; surface the explanation |
.guardrailViolation |
input or output tripped a content guardrail | show the user; do not blind-retry |
.contextWindowExceeded |
the prompt did not fit the window | trim tools, persona, or tool results |
.unsupportedLanguageOrLocale |
content in a language the model does not handle | tell the user |
.generation |
any other generation failure | retryable — the model occasionally fails transiently |
.inconsistentSnapshotStream |
the provider received an inconsistent stream of cumulative updates from the OS model | treat as a transient model failure — retry the turn |
.networkFailure |
Private Cloud Compute unreachable | retry when connectivity returns |
.quotaExhausted |
Private Cloud Compute quota or rate limit | back off until resetDate |
.serviceUnavailable |
the Private Cloud Compute service is down | fall back to on-device |
The on-device model occasionally refuses or fails for reasons your code does
not control — treat .refusal and .generation as conditions to handle, not
bugs to fix. A failed turn rolls back completely: no assistant message
persists, tool effects revert, and any partial streamed text should be
discarded. Keep a fallback arm when switching over the enum.
Configuration errors surface as AppleProviderError, thrown before any
model work:
| Case | Meaning | What to do |
|---|---|---|
.missingExecutors |
the agent declares a tool the provider has no executor for | hand the provider every executor its tools need |
.noUserMessage |
a turn reached the model with no user content | send user text — an empty turn has nothing to answer |
.unsupportedStructuredSchema |
the schema cannot be represented as guided generation | reshape it — object root, no dynamic keys |
.invalidToolCallBudget |
a negative tool-call budget | fix the limit — zero means no tool calls; negative is an error |
.undecodableImageAttachment |
a data-backed image could not be decoded for native attachment | check the bytes and media type before sending |
Next
- Choose a provider — when on-device is the right call.
- Capability matrix — every provider, every flag.
- Control what leaves the device — the egress hook Private Cloud Compute engages.