Send images
Attach images to a turn and the model sees them alongside your text — on providers with vision, as real pixels; everywhere else, as an honest text stand-in that keeps the rest of the prompt working.
Attach images to a message
send(_:images:) takes an array of ImageRef. There are two forms: raw
bytes with a media type, or a file URL.
let frame = try Data(contentsOf: exportedFrameURL)
try await agent.send(
"What's wrong with this frame?",
images: [.data(frame, mimeType: "image/png")]
)
try await agent.send(
"Compare these two frames.",
images: [.fileURL(beforeURL), .fileURL(afterURL)]
)
The turn appends one user message with a pinned content order: your text first, then the images in argument order.
File URLs are read exactly once, at the send boundary — the conversation stores bytes, so providers never touch your disk and a resumed conversation carries real content. If a file can't be read, or no media type can be derived from its extension, the call throws before anything is appended or sent — the conversation is untouched and no request leaves the device:
do {
try await agent.send("Check this.", images: [.fileURL(missingURL)])
} catch AgentSessionError.unreadableImageAttachment(let url, let detail) {
print("could not attach \(url.lastPathComponent): \(detail)")
}
Return images from tools
A tool result can carry .image content alongside text and JSON:
return .success(ToolResultPayload(content: [
.text("Rendered frame at 12.5s"),
.image(.data(frameData, mimeType: "image/png")),
]))
The bytes are preserved losslessly in conversation history and on the
AgentKit Cloud wire. One caveat: the direct Anthropic, OpenAI, and
Gemini adapters flatten tool-result content to text, so an image inside a
tool result reaches a direct provider as an [image] placeholder. When the
model must see the pixels on a direct provider, attach them to a user message
with send(_:images:) instead.
What each provider sees
| Path | What the model receives |
|---|---|
| Anthropic (direct) | native base64 image blocks |
| OpenAI (direct) | data-URL image parts |
| Gemini (direct) | inline image data |
| AgentKit Cloud — Anthropic-routed tier | image blocks pass through byte-identical |
| AgentKit Cloud — OpenAI-routed tier | user-message images convert losslessly to data-URL parts; images in assistant history become [Image: <mime>] text |
| AgentKit Cloud — Gemini-routed tier | images convert to inline data, bytes intact |
| Apple on-device on OS 26 | no vision — images become compact text descriptors; the rest of the prompt works unchanged |
| Apple on-device on OS 27+ (app built with the OS 27 SDK) | data-backed images attach natively; vision support reflects what the configured model reports |
Two edge cases worth knowing:
- a
.fileURLimage that enters history without going throughsend()(seeded conversation history) degrades to a name-only text descriptor on every wire — providers never read the filesystem - an image on an assistant message is malformed history; direct providers drop it with a logged warning rather than failing the turn
Next
- On-device with Apple — what the on-device model can and cannot see.
- Error reference — every typed error, including
unreadableImageAttachment.