coreml

Integrate Core ML models in iOS apps for on-device machine learning inference. Covers model loading (.mlmodel, .mlpackage, .mlmodelc), predictions with auto-generated classes and MLFeatureProvider, compute unit configuration (CPU, GPU, Neural Engine), MLTensor, VNCoreMLRequest, MLComputePlan, multi-model pipelines, and deployment strategies. Use when loading Core ML models, making predictions, configuring compute units, or profiling model performance.
Skill file

Preview skill file↓↑
---
name: coreml
description: "Integrate Core ML models in iOS apps for on-device machine learning inference. Covers model loading (.mlmodel, .mlpackage, .mlmodelc), predictions with auto-generated classes and MLFeatureProvider, compute unit configuration (CPU, GPU, Neural Engine), MLTensor, VNCoreMLRequest, MLComputePlan, multi-model pipelines, and deployment strategies. Use when loading Core ML models, making predictions, configuring compute units, or profiling model performance."
---

# Core ML Swift Integration

Load, configure, and run Core ML models in iOS apps. This skill covers the
Swift side: model loading, prediction, MLTensor, profiling, and deployment.
Target iOS 26+ with Swift 6.3, backward-compatible to iOS 14 unless noted.

> **Scope boundary:** Python-side model conversion, optimization (quantization,
> palettization, pruning), and framework selection live in the `apple-on-device-ai`
> skill. This skill owns Swift integration only.

See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for complete code patterns including
actor-based caching, batch inference, image preprocessing, and testing.

## Contents

- [Loading Models](#loading-models)
- [Model Configuration](#model-configuration)
- [Making Predictions](#making-predictions)
- [MLTensor (iOS 18+)](#mltensor-ios-18)
- [Working with MLMultiArray](#working-with-mlmultiarray)
- [Image Preprocessing](#image-preprocessing)
- [Multi-Model Pipelines](#multi-model-pipelines)
- [Vision Integration](#vision-integration)
- [Performance Profiling](#performance-profiling)
- [Model Deployment](#model-deployment)
- [Memory Management](#memory-management)
- [Common Mistakes](#common-mistakes)
- [Review Checklist](#review-checklist)
- [References](#references)

## Loading Models

### Auto-Generated Classes

When you add a `.mlmodel` or `.mlpackage` to an app target, Xcode generates a Swift
class with typed input/output. Use this whenever possible.

```swift
import CoreML

let config = MLModelConfiguration()
config.computeUnits = .all

let model = try MyImageClassifier(configuration: config)
```

### Manual Loading

Load from a URL when the model is downloaded at runtime or stored outside the
bundle.

```swift
let modelURL = Bundle.main.url(
    forResource: "MyModel", withExtension: "mlmodelc"
)!
let model = try MLModel(contentsOf: modelURL, configuration: config)
```

### Async Loading (iOS 15+)

Load models without blocking the main thread. Prefer this for large models.

```swift
let model = try await MLModel.load(
    contentsOf: modelURL,
    configuration: config
)
```

### Compile at Runtime (iOS 16+)

Compile a `.mlpackage` or `.mlmodel` to `.mlmodelc` on device. Useful for
models downloaded from a server. Do this once per model version, not on every
launch.

```swift
let compiledURL = try await MLModel.compileModel(at: packageURL)
let model = try await MLModel.load(contentsOf: compiledURL, configuration: config)
```

Cache the compiled URL -- recompiling on every launch is a bug. Copy
`compiledURL` to a persistent location (e.g., Application Support). When
reviewing runtime-loaded models, call out both facts together: async
`MLModel.compileModel(at:)` is iOS 16+, and compiled models must be cached so the
app does not recompile on every launch.

## Model Configuration

`MLModelConfiguration` controls compute units, GPU access, and model parameters.

### Compute Units Decision Table

| Value | Uses | When to Choose |
|---|---|---|
| `.all` | CPU + GPU + Neural Engine | Default. Let the system decide. |
| `.cpuOnly` | CPU | Deterministic tests, CPU-only fallbacks, or constrained work after profiling shows accelerator policy, contention, thermal state, or energy budget is the limiting factor. |
| `.cpuAndGPU` | CPU + GPU | Need GPU but model has ops unsupported by ANE. |
| `.cpuAndNeuralEngine` (iOS 16+) | CPU + Neural Engine | Best energy efficiency for compatible models. |

```swift
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine

// Optional fallback for constrained work after profiling and policy review
config.computeUnits = .cpuOnly
```

### Configuration Properties

```swift
let config = MLModelConfiguration()
config.computeUnits = .all
config.allowLowPrecisionAccumulationOnGPU = true // faster, slight precision loss
```

## Making Predictions

### With Auto-Generated Classes

The generated class provides typed input/output structs.

```swift
let model = try MyImageClassifier(configuration: config)
let input = MyImageClassifierInput(image: pixelBuffer)
let output = try model.prediction(input: input)
print(output.classLabel)        // "golden_retriever"
print(output.classLabelProbs)   // ["golden_retriever": 0.95, ...]
```

### With MLDictionaryFeatureProvider

Use when inputs are dynamic or not known at compile time.

```swift
let inputFeatures = try MLDictionaryFeatureProvider(dictionary: [
    "image": MLFeatureValue(pixelBuffer: pixelBuffer),
    "confidence_threshold": MLFeatureValue(double: 0.5),
])
let output = try model.prediction(from: inputFeatures)
let label = output.featureValue(for: "classLabel")?.stringValue
```

### Prediction Inside Async Workflows

`MLModel.prediction(...)` is synchronous. In async pipelines, keep model loading
async, then run prediction from an actor or non-main task without adding `await`
to the prediction call.

```swift
let output = try model.prediction(from: inputFeatures)
```

### Batch Prediction

Process multiple inputs in one call for better throughput.

```swift
let batchInputs = try MLArrayBatchProvider(array: inputs.map { input in
    try MLDictionaryFeatureProvider(dictionary: ["image": MLFeatureValue(pixelBuffer: input)])
})
let batchOutput = try model.predictions(fromBatch: batchInputs)
for i in 0..<batchOutput.count {
    let result = batchOutput.features(at: i)
    print(result.featureValue(for: "classLabel")?.stringValue ?? "unknown")
}
```

Use `predictions(fromBatch:)` when batching without explicit
`MLPredictionOptions`. Use `predictions(from:options:)` only when passing both an
`MLBatchProvider` and `MLPredictionOptions`; `predictions(from:)` by itself is
not the no-options batch API.

### Stateful Prediction (iOS 18+)

Use `MLState` for models that maintain state across predictions (sequence models,
LLMs, audio accumulators). Create state once and pass it to each prediction call.

```swift
let state = model.makeState()

// Each synchronous prediction carries forward the internal model state
for frame in audioFrames {
    let input = try MLDictionaryFeatureProvider(dictionary: [
        "audio_features": MLFeatureValue(multiArray: frame)
    ])
    let output = try model.prediction(from: input, using: state)
    let classification = output.featureValue(for: "label")?.stringValue
}
```

`MLState` is `Sendable`, but `Sendable` does not make one state safe for
concurrent inference. Predictions using the same state must be serialized; do
not read or write state buffers while a prediction is in flight. Call
`model.makeState()` for each independent concurrent stream. If you need
`MLPredictionOptions`, iOS 18+ also provides the async
`prediction(from:using:options:)` overload; the same one-in-flight-per-state rule
still applies.

## MLTensor (iOS 18+)

`MLTensor` is a Swift-native multidimensional array for pre/post-processing.
Operations run lazily -- call `await tensor.shapedArray(of:)` to materialize results.

```swift
import CoreML

// Creation
let tensor = MLTensor([1.0, 2.0, 3.0, 4.0])
let zeros = MLTensor(zeros: [3, 224, 224], scalarType: Float.self)

// Reshaping
let reshaped = tensor.reshaped(to: [2, 2])

// Math operations
let softmaxed = tensor.softmax(alongAxis: -1)
let centered = tensor - tensor.mean()

// Interop with MLShapedArray / MLMultiArray
let shaped = await tensor.shapedArray(of: Float.self)
let multiArray = try MLMultiArray(shaped)
let shapedAgain = MLShapedArray<Float>(multiArray)
```

Do not invent `MLTensor` APIs for statistics or bridging. Avoid examples such as
`MLTensor(multiArray)`, `tensor.std()`, `tensor.standardDeviation()`, direct
lazy-buffer access, or synchronous extraction; perform unsupported DSP/statistics
outside the tensor pipeline or with source-confirmed tensor operations.

## Working with MLMultiArray

`MLMultiArray` is the primary data exchange type for non-image model inputs and
outputs. Use it when the auto-generated class expects array-type features.

```swift
// Create a 3D array: [batch, sequence, features]
let array = try MLMultiArray(shape: [1, 128, 768], dataType: .float32)

// Write values
for i in 0..<128 {
    array[[0, i, 0] as [NSNumber]] = NSNumber(value: Float(i))
}

// Read values
let value = array[[0, 0, 0] as [NSNumber]].floatValue

let data: [Float] = [1.0, 2.0, 3.0]
let shaped = MLShapedArray(scalars: data, shape: [3])
let fromShaped = try MLMultiArray(shaped)
```

See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for advanced MLMultiArray patterns
including NLP tokenization and audio feature extraction.

## Image Preprocessing

Image models expect `CVPixelBuffer` input. Use `CGImage` conversion for photos
from the camera or photo library. Vision's `VNCoreMLRequest` handles this
automatically; manual conversion is needed only for direct `MLModel` prediction.

```swift
import CoreVideo

func createPixelBuffer(from cgImage: CGImage, width: Int, height: Int) -> CVPixelBuffer? {
    var pixelBuffer: CVPixelBuffer?
    let attrs: [CFString: Any] = [
        kCVPixelBufferCGImageCompatibilityKey: true,
        kCVPixelBufferCGBitmapContextCompatibilityKey: true,
    ]
    CVPixelBufferCreate(kCFAllocatorDefault, width, height,
                        kCVPixelFormatType_32ARGB, attrs as CFDictionary, &pixelBuffer)

    guard let buffer = pixelBuffer else { return nil }
    CVPixelBufferLockBaseAddress(buffer, [])
    let context = CGContext(
        data: CVPixelBufferGetBaseAddress(buffer),
        width: width, height: height,
        bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer),
        space: CGColorSpaceCreateDeviceRGB(),
        bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
    )
    context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
    CVPixelBufferUnlockBaseAddress(buffer, [])
    return buffer
}
```

For additional preprocessing patterns (normalization, center-cropping), see
[references/coreml-swift-integration.md](references/coreml-swift-integration.md).

## Multi-Model Pipelines

Chain models when preprocessing or postprocessing requires a separate model.

```swift
// Sequential inference: preprocessor -> main model -> postprocessor
let preprocessed = try preprocessor.prediction(from: rawInput)
let mainOutput = try mainModel.prediction(from: preprocessed)
let finalOutput = try postprocessor.prediction(from: mainOutput)
```

For Xcode-managed pipelines, use the pipeline model type in the `.mlpackage`.
Each sub-model runs on its optimal compute unit.

## Vision Integration

Use Vision to run Core ML image models with automatic image preprocessing
(resizing, normalization, color space, orientation).

### Modern: CoreMLRequest (iOS 18+)

```swift
import Vision
import CoreML

let model = try MLModel(contentsOf: modelURL, configuration: config)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)

if let classification = results.first as? ClassificationObservation {
    print("\(classification.identifier): \(classification.confidence)")
}
```

### Legacy: VNCoreMLRequest

```swift
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
    guard let results = request.results as? [VNRecognizedObjectObservation] else { return }
    for observation in results {
        let label = observation.labels.first?.identifier ?? "unknown"
        let confidence = observation.labels.first?.confidence ?? 0
        let boundingBox = observation.boundingBox // normalized coordinates
        print("\(label): \(confidence) at \(boundingBox)")
    }
}
request.imageCropAndScaleOption = .scaleFill

let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer)
try handler.perform([request])
```

> For complete Vision framework patterns (text recognition, barcode detection,
> document scanning), see the `vision-framework` skill.

## Performance Profiling

### MLComputePlan (iOS 17.4+)

Inspect which compute device each operation will use before running predictions.

```swift
let computePlan = try await MLComputePlan.load(
    contentsOf: modelURL, configuration: config
)
guard case let .program(program) = computePlan.modelStructure else { return }
guard let mainFunction = program.functions["main"] else { return }

for operation in mainFunction.block.operations {
    let deviceUsage = computePlan.deviceUsage(for: operation)
    let estimatedCost = computePlan.estimatedCost(of: operation)
    print("\(operation.operatorName): \(String(describing: deviceUsage?.preferred))")
}
```

### Instruments

Use the **Core ML** instrument template in Instruments to profile:
- Model load time
- Prediction latency (per-operation breakdown)
- Compute device dispatch (CPU/GPU/ANE per operation)
- Memory allocation

Run outside the debugger for accurate results (Xcode: Product > Profile).

## Model Deployment

### Bundle vs Downloaded Assets

| Strategy | Pros | Cons |
|---|---|---|
| Bundle in app | Instant availability, works offline | Increases app download size |
| Background Assets | Preferred for large or updateable model assets | Requires asset-pack setup |
| On-demand resources | Smaller initial download for existing ODR apps | Legacy technology; prefer Background Assets for new work |
| CloudKit / server | Maximum flexibility | Requires network, longer setup |

### Size Considerations

- For iOS/iPadOS 18+, App Store Connect lists a 4 GB thinned app bundle limit
  and 8 GB thinned ODR asset-pack limit.
- Prefer Background Assets for new large or updateable model assets; keep ODR
  guidance for existing projects that already use it.
- Pre-compile to `.mlmodelc` to skip on-device compilation
- For downloaded `.mlmodel` or `.mlpackage` files, compile once with
  `MLModel.compileModel(at:)`, move the resulting `.mlmodelc` out of Core ML's
  temporary location, and cache it by model version.
- Validate memory and performance on physical target devices, especially the
  lowest-memory supported device. Check model load, first prediction, repeated
  predictions, background/foreground transitions, and low-memory behavior.

For Background Assets, make the asset pack locally available, resolve the model
URL, then load the compiled model with `MLModel.load(contentsOf:configuration:)`.

```swift
// Existing On-Demand Resources project
let request = NSBundleResourceRequest(tags: ["ml-model-v2"])
try await request.beginAccessingResources()
let modelURL = Bundle.main.url(forResource: "LargeModel", withExtension: "mlmodelc")!
let model = try await MLModel.load(contentsOf: modelURL, configuration: config)
// Call request.endAccessingResources() when done
```

## Memory Management

- **Unload on background:** Release model references when the app enters background
  to free GPU/ANE memory. Reload on foreground return.
- **Choose compute units by context:** use `.all` by default. Consider `.cpuOnly`
  only when profiling or app policy shows accelerator contention, thermal state,
  energy budget, deterministic testing, or a legitimate background execution
  constraint makes CPU the right tradeoff.
- **Share model instances:** Never create multiple `MLModel` instances from the same
  compiled model. Use an actor to provide shared access.
- **Monitor memory pressure:** Large models (>100 MB) can trigger memory warnings.
  Register for `UIApplication.didReceiveMemoryWarningNotification` and release
  cached models when under pressure.

See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for an actor-based model manager with
lifecycle-aware loading and cache eviction.

## Common Mistakes

**DON'T:** Load models on the main thread.
**DO:** Use `MLModel.load(contentsOf:configuration:)` async API or load on a background actor.
**Why:** Large models can take seconds to load, freezing the UI.

**DON'T:** Recompile `.mlpackage` to `.mlmodelc` on every app launch.
**DO:** Compile once with `MLModel.compileModel(at:)` and cache the compiled URL persistently.
**Why:** Compilation is expensive. Cache the `.mlmodelc` in Application Support.

**DON'T:** Hardcode `.cpuOnly` unless you have a specific reason.
**DO:** Use `.all` and let the system choose the optimal compute unit.
**Why:** `.all` enables Neural Engine and GPU, which are faster and more energy-efficient.

**DON'T:** Claim GPU or Neural Engine are categorically unavailable for all
background-adjacent work.
**DO:** Treat background execution as policy-, mode-, contention-, thermal-, and
energy-dependent, and profile the actual workload on device.
**Why:** Apps may be suspended, throttled, or limited by their background mode;
`.cpuOnly` is a tradeoff, not a universal requirement.

**DON'T:** Ignore `MLFeatureValue` type mismatches between input and model expectations.
**DO:** Match types exactly -- use `MLFeatureValue(pixelBuffer:)` for images, not raw data.
**Why:** Type mismatches cause cryptic runtime crashes or silent incorrect results.

**DON'T:** Create a new `MLModel` instance for every prediction.
**DO:** Load once and reuse. Use an actor to manage the model lifecycle.
**Why:** Model loading allocates significant memory and compute resources.

**DON'T:** Skip error handling for model loading and prediction.
**DO:** Catch errors and provide fallback behavior when the model fails.
**Why:** Models can fail to load on older devices or when resources are constrained.

**DON'T:** Assume all operations run on the Neural Engine.
**DO:** Use `MLComputePlan` (iOS 17.4+) to verify device dispatch per operation.
**Why:** Unsupported operations fall back to CPU, which may bottleneck the pipeline.

**DON'T:** Process images manually before passing to Vision + Core ML.
**DO:** Use `CoreMLRequest` (iOS 18+) or `VNCoreMLRequest` (legacy) to let Vision handle preprocessing.
**Why:** Vision handles orientation, scaling, and pixel format conversion correctly.

## Review Checklist

- [ ] Model loaded asynchronously (not blocking main thread)
- [ ] `MLModelConfiguration.computeUnits` set appropriately for use case
- [ ] Model instance reused across predictions (not recreated each time)
- [ ] Auto-generated class used when available (typed inputs/outputs)
- [ ] Error handling for model loading and prediction failures
- [ ] Compiled model cached persistently if compiled at runtime
- [ ] Image inputs use Vision pipeline (`CoreMLRequest` iOS 18+ or `VNCoreMLRequest`) for correct preprocessing
- [ ] `MLComputePlan` checked to verify compute device dispatch (iOS 17.4+)
- [ ] Batch predictions used when processing multiple inputs
- [ ] Model size appropriate for deployment strategy (bundle, Background Assets, ODR)
- [ ] Memory tested on target devices (especially older devices with less RAM)
- [ ] Predictions run outside debugger for accurate performance measurement

## References

- Patterns and code: [references/coreml-swift-integration.md](references/coreml-swift-integration.md)
- Model conversion and optimization (Python-side): covered in the `apple-on-device-ai` skill
- Apple docs: [Core ML](https://sosumi.ai/documentation/coreml) |
  [MLModel](https://sosumi.ai/documentation/coreml/mlmodel) |
  [MLTensor](https://sosumi.ai/documentation/coreml/mltensor) |
  [MLComputePlan](https://sosumi.ai/documentation/coreml/mlcomputeplan-1w21n) |
  [Background Assets](https://sosumi.ai/documentation/backgroundassets)
Source

Creator's repository · dpearson2699/swift-ios-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk