Send images

Attach images to a turn and the model sees them alongside your text — on providers with vision, as real pixels; everywhere else, as an honest text stand-in that keeps the rest of the prompt working.

Attach images to a message

send(_:images:) takes an array of ImageRef. There are two forms: raw bytes with a media type, or a file URL.

let frame = try Data(contentsOf: exportedFrameURL)
try await agent.send(
    "What's wrong with this frame?",
    images: [.data(frame, mimeType: "image/png")]
)
try await agent.send(
    "Compare these two frames.",
    images: [.fileURL(beforeURL), .fileURL(afterURL)]
)

The turn appends one user message with a pinned content order: your text first, then the images in argument order.

File URLs are read exactly once, at the send boundary — the conversation stores bytes, so providers never touch your disk and a resumed conversation carries real content. If a file can't be read, or no media type can be derived from its extension, the call throws before anything is appended or sent — the conversation is untouched and no request leaves the device:

do {
    try await agent.send("Check this.", images: [.fileURL(missingURL)])
} catch AgentSessionError.unreadableImageAttachment(let url, let detail) {
    print("could not attach \(url.lastPathComponent): \(detail)")
}

Return images from tools

A tool result can carry .image content alongside text and JSON:

return .success(ToolResultPayload(content: [
    .text("Rendered frame at 12.5s"),
    .image(.data(frameData, mimeType: "image/png")),
]))

The bytes are preserved losslessly in conversation history and on the AgentKit Cloud wire. One caveat: the direct Anthropic, OpenAI, and Gemini adapters flatten tool-result content to text, so an image inside a tool result reaches a direct provider as an [image] placeholder. When the model must see the pixels on a direct provider, attach them to a user message with send(_:images:) instead.

What each provider sees

Path What the model receives
Anthropic (direct) native base64 image blocks
OpenAI (direct) data-URL image parts
Gemini (direct) inline image data
AgentKit Cloud — Anthropic-routed tier image blocks pass through byte-identical
AgentKit Cloud — OpenAI-routed tier user-message images convert losslessly to data-URL parts; images in assistant history become [Image: <mime>] text
AgentKit Cloud — Gemini-routed tier images convert to inline data, bytes intact
Apple on-device on OS 26 no vision — images become compact text descriptors; the rest of the prompt works unchanged
Apple on-device on OS 27+ (app built with the OS 27 SDK) data-backed images attach natively; vision support reflects what the configured model reports

Two edge cases worth knowing:

  • a .fileURL image that enters history without going through send() (seeded conversation history) degrades to a name-only text descriptor on every wire — providers never read the filesystem
  • an image on an assistant message is malformed history; direct providers drop it with a logged warning rather than failing the turn

Next