coreml

Integrate Core ML models in iOS apps for on-device machine learning inference. Covers model loading (.mlmodel, .mlpackage, .mlmodelc), predictions with auto-generated classes and MLFeatureProvider, compute unit configuration (CPU, GPU, Neural Engine), MLTensor, VNCoreMLRequest, MLComputePlan, multi-model pipelines, and deployment strategies. Use when loading Core ML models, making predictions, configuring compute units, or profiling model performance.

Skill file

Preview skill file
---
name: coreml
description: "Integrate Core ML models in iOS apps for on-device machine learning inference. Covers model loading (.mlmodel, .mlpackage, .mlmodelc), predictions with auto-generated classes and MLFeatureProvider, compute unit configuration (CPU, GPU, Neural Engine), MLTensor, VNCoreMLRequest, MLComputePlan, multi-model pipelines, and deployment strategies. Use when loading Core ML models, making predictions, configuring compute units, or profiling model performance."
---

# Core ML Swift Integration

Load, configure, and run Core ML models in iOS apps. This skill covers the
Swift side: model loading, prediction, MLTensor, profiling, and deployment.
Target iOS 26+ with Swift 6.3, backward-compatible to iOS 14 unless noted.

> **Scope boundary:** Python-side model conversion, optimization (quantization,
> palettization, pruning), and framework selection live in the `apple-on-device-ai`
> skill. This skill owns Swift integration only.

See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for complete code patterns including
actor-based caching, batch inference, image preprocessing, and testing.

## Contents

- [Loading Models](#loading-models)
- [Model Configuration](#model-configuration)
- [Making Predictions](#making-predictions)
- [MLTensor (iOS 18+)](#mltensor-ios-18)
- [Working with MLMultiArray](#working-with-mlmultiarray)
- [Image Preprocessing](#image-preprocessing)
- [Multi-Model Pipelines](#multi-model-pipelines)
- [Vision Integration](#vision-integration)
- [Performance Profiling](#performance-profiling)
- [Model Deployment](#model-deployment)
- [Memory Management](#memory-management)
- [Common Mistakes](#common-mistakes)
- [Review Checklist](#review-checklist)
- [References](#references)

## Loading Models

### Auto-Generated Classes

When you add a `.mlmodel` or `.mlpackage` to an app target, Xcode generates a Swift
class with typed input/output. Use this whenever possible.

```swift
import CoreML

let config = MLModelConfiguration()
config.computeUnits = .all

let model = try MyImageClassifier(configuration: config)
```

### Manual Loading

Load from a URL when the model is downloaded at runtime or stored outside the
bundle.

```swift
let modelURL = Bundle.main.url(
    forResource: "MyModel", withExtension: "mlmodelc"
)!
let model = try MLModel(contentsOf: modelURL, configuration: config)
```

### Async Loading (iOS 15+)

Load models without blocking the main thread. Prefer this for large models.

```swift
let model = try await MLModel.load(
    contentsOf: modelURL,
    configuration: config
)
```

### Compile at Runtime (iOS 16+)

Compile a `.mlpackage` or `.mlmodel` to `.mlmodelc` on device. Useful for
models downloaded from a server. Do this once per model version, not on every
launch.

```swift
let compiledURL = try await MLModel.compileModel(at: packageURL)
let model = try await MLModel.load(contentsOf: compiledURL, configuration: config)
```

Cache the compiled URL -- recompiling on every launch is a bug. Copy
`compiledURL` to a persistent location (e.g., Application Support). When
reviewing runtime-loaded models, call out both facts together: async
`MLModel.compileModel(at:)` is iOS 16+, and compiled models must be cached so the
app does not recompile on every launch.

## Model Configuration

`MLModelConfiguration` controls compute units, GPU access, and model parameters.

### Compute Units Decision Table

| Value | Uses | When to Choose |
|---|---|---|
| `.all` | CPU + GPU + Neural Engine | Default. Let the system decide. |
| `.cpuOnly` | CPU | Deterministic tests, CPU-only fallbacks, or constrained work after profiling shows accelerator policy, contention, thermal state, or energy budget is the limiting factor. |
| `.cpuAndGPU` | CPU + GPU | Need GPU but model has ops unsupported by ANE. |
| `.cpuAndNeuralEngine` (iOS 16+) | CPU + Neural Engine | Best energy efficiency for compatible models. |

```swift
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine

// Optional fallback for constrained work after profiling and policy review
config.computeUnits = .cpuOnly
```

### Configuration Properties

```swift
let config = MLModelConfiguration()
config.computeUnits = .all
config.allowLowPrecisionAccumulationOnGPU = true // faster, slight precision loss
```

## Making Predictions

### With Auto-Generated Classes

The generated class provides typed input/output structs.

```swift
let model = try MyImageClassifier(configuration: config)
let input = MyImageClassifierInput(image: pixelBuffer)
let output = try model.prediction(input: input)
print(output.classLabel)        // "golden_retriever"
print(output.classLabelProbs)   // ["golden_retriever": 0.95, ...]
```

### With MLDictionaryFeatureProvider

Use when inputs are dynamic or not known at compile time.

```swift
let inputFeatures = try MLDictionaryFeatureProvider(dictionary: [
    "image": MLFeatureValue(pixelBuffer: pixelBuffer),
    "confidence_threshold": MLFeatureValue(double: 0.5),
])
let output = try model.prediction(from: inputFeatures)
let label = output.featureValue(for: "classLabel")?.stringValue
```

### Prediction Inside Async Workflows

`MLModel.prediction(...)` is synchronous. In async pipelines, keep model loading
async, then run prediction from an actor or non-main task without adding `await`
to the prediction call.

```swift
let output = try model.prediction(from: inputFeatures)
```

### Batch Prediction

Process multiple inputs in one call for better throughput.

```swift
let batchInputs = try MLArrayBatchProvider(array: inputs.map { input in
    try MLDictionaryFeatureProvider(dictionary: ["image": MLFeatureValue(pixelBuffer: input)])
})
let batchOutput = try model.predictions(fromBatch: batchInputs)
for i in 0..<batchOutput.count {
    let result = batchOutput.features(at: i)
    print(result.featureValue(for: "classLabel")?.stringValue ?? "unknown")
}
```

Use `predictions(fromBatch:)` when batching without explicit
`MLPredictionOptions`. Use `predictions(from:options:)` only when passing both an
`MLBatchProvider` and `MLPredictionOptions`; `predictions(from:)` by itself is
not the no-options batch API.

### Stateful Prediction (iOS 18+)

Use `MLState` for models that maintain state across predictions (sequence models,
LLMs, audio accumulators). Create state once and pass it to each prediction call.

```swift
let state = model.makeState()

// Each synchronous prediction carries forward the internal model state
for frame in audioFrames {
    let input = try MLDictionaryFeatureProvider(dictionary: [
        "audio_features": MLFeatureValue(multiArray: frame)
    ])
    let output = try model.prediction(from: input, using: state)
    let classification = output.featureValue(for: "label")?.stringValue
}
```

`MLState` is `Sendable`, but `Sendable` does not make one state safe for
concurrent inference. Predictions using the same state must be serialized; do
not read or write state buffers while a prediction is in flight. Call
`model.makeState()` for each independent concurrent stream. If you need
`MLPredictionOptions`, iOS 18+ also provides the async
`prediction(from:using:options:)` overload; the same one-in-flight-per-state rule
still applies.

## MLTensor (iOS 18+)

`MLTensor` is a Swift-native multidimensional array for pre/post-processing.
Operations run lazily -- call `await tensor.shapedArray(of:)` to materialize results.

```swift
import CoreML

// Creation
let tensor = MLTensor([1.0, 2.0, 3.0, 4.0])
let zeros = MLTensor(zeros: [3, 224, 224], scalarType: Float.self)

// Reshaping
let reshaped = tensor.reshaped(to: [2, 2])

// Math operations
let softmaxed = tensor.softmax(alongAxis: -1)
let centered = tensor - tensor.mean()

// Interop with MLShapedArray / MLMultiArray
let shaped = await tensor.shapedArray(of: Float.self)
let multiArray = try MLMultiArray(shaped)
let shapedAgain = MLShapedArray<Float>(multiArray)
```

Do not invent `MLTensor` APIs for statistics or bridging. Avoid examples such as
`MLTensor(multiArray)`, `tensor.std()`, `tensor.standardDeviation()`, direct
lazy-buffer access, or synchronous extraction; perform unsupported DSP/statistics
outside the tensor pipeline or with source-confirmed tensor operations.

## Working with MLMultiArray

`MLMultiArray` is the primary data exchange type for non-image model inputs and
outputs. Use it when the auto-generated class expects array-type features.

```swift
// Create a 3D array: [batch, sequence, features]
let array = try MLMultiArray(shape: [1, 128, 768], dataType: .float32)

// Write values
for i in 0..<128 {
    array[[0, i, 0] as [NSNumber]] = NSNumber(value: Float(i))
}

// Read values
let value = array[[0, 0, 0] as [NSNumber]].floatValue

let data: [Float] = [1.0, 2.0, 3.0]
let shaped = MLShapedArray(scalars: data, shape: [3])
let fromShaped = try MLMultiArray(shaped)
```

See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for advanced MLMultiArray patterns
including NLP tokenization and audio feature extraction.

## Image Preprocessing

Image models expect `CVPixelBuffer` input. Use `CGImage` conversion for photos
from the camera or photo library. Vision's `VNCoreMLRequest` handles this
automatically; manual conversion is needed only for direct `MLModel` prediction.

```swift
import CoreVideo

func createPixelBuffer(from cgImage: CGImage, width: Int, height: Int) -> CVPixelBuffer? {
    var pixelBuffer: CVPixelBuffer?
    let attrs: [CFString: Any] = [
        kCVPixelBufferCGImageCompatibilityKey: true,
        kCVPixelBufferCGBitmapContextCompatibilityKey: true,
    ]
    CVPixelBufferCreate(kCFAllocatorDefault, width, height,
                        kCVPixelFormatType_32ARGB, attrs as CFDictionary, &pixelBuffer)

    guard let buffer = pixelBuffer else { return nil }
    CVPixelBufferLockBaseAddress(buffer, [])
    let context = CGContext(
        data: CVPixelBufferGetBaseAddress(buffer),
        width: width, height: height,
        bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer),
        space: CGColorSpaceCreateDeviceRGB(),
        bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
    )
    context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
    CVPixelBufferUnlockBaseAddress(buffer, [])
    return buffer
}
```

For additional preprocessing patterns (normalization, center-cropping), see
[references/coreml-swift-integration.md](references/coreml-swift-integration.md).

## Multi-Model Pipelines

Chain models when preprocessing or postprocessing requires a separate model.

```swift
// Sequential inference: preprocessor -> main model -> postprocessor
let preprocessed = try preprocessor.prediction(from: rawInput)
let mainOutput = try mainModel.prediction(from: preprocessed)
let finalOutput = try postprocessor.prediction(from: mainOutput)
```

For Xcode-managed pipelines, use the pipeline model type in the `.mlpackage`.
Each sub-model runs on its optimal compute unit.

## Vision Integration

Use Vision to run Core ML image models with automatic image preprocessing
(resizing, normalization, color space, orientation).

### Modern: CoreMLRequest (iOS 18+)

```swift
import Vision
import CoreML

let model = try MLModel(contentsOf: modelURL, configuration: config)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)

if let classification = results.first as? ClassificationObservation {
    print("\(classification.identifier): \(classification.confidence)")
}
```

### Legacy: VNCoreMLRequest

```swift
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
    guard let results = request.results as? [VNRecognizedObjectObservation] else { return }
    for observation in results {
        let label = observation.labels.first?.identifier ?? "unknown"
        let confidence = observation.labels.first?.confidence ?? 0
        let boundingBox = observation.boundingBox // normalized coordinates
        print("\(label): \(confidence) at \(boundingBox)")
    }
}
request.imageCropAndScaleOption = .scaleFill

let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer)
try handler.perform([request])
```

> For complete Vision framework patterns (text recognition, barcode detection,
> document scanning), see the `vision-framework` skill.

## Performance Profiling

### MLComputePlan (iOS 17.4+)

Inspect which compute device each operation will use before running predictions.

```swift
let computePlan = try await MLComputePlan.load(
    contentsOf: modelURL, configuration: config
)
guard case let .program(program) = computePlan.modelStructure else { return }
guard let mainFunction = program.functions["main"] else { return }

for operation in mainFunction.block.operations {
    let deviceUsage = computePlan.deviceUsage(for: operation)
    let estimatedCost = computePlan.estimatedCost(of: operation)
    print("\(operation.operatorName): \(String(describing: deviceUsage?.preferred))")
}
```

### Instruments

Use the **Core ML** instrument template in Instruments to profile:
- Model load time
- Prediction latency (per-operation breakdown)
- Compute device dispatch (CPU/GPU/ANE per operation)
- Memory allocation

Run outside the debugger for accurate results (Xcode: Product > Profile).

## Model Deployment

### Bundle vs Downloaded Assets

| Strategy | Pros | Cons |
|---|---|---|
| Bundle in app | Instant availability, works offline | Increases app download size |
| Background Assets | Preferred for large or updateable model assets | Requires asset-pack setup |
| On-demand resources | Smaller initial download for existing ODR apps | Legacy technology; prefer Background Assets for new work |
| CloudKit / server | Maximum flexibility | Requires network, longer setup |

### Size Considerations

- For iOS/iPadOS 18+, App Store Connect lists a 4 GB thinned app bundle limit
  and 8 GB thinned ODR asset-pack limit.
- Prefer Background Assets for new large or updateable model assets; keep ODR
  guidance for existing projects that already use it.
- Pre-compile to `.mlmodelc` to skip on-device compilation
- For downloaded `.mlmodel` or `.mlpackage` files, compile once with
  `MLModel.compileModel(at:)`, move the resulting `.mlmodelc` out of Core ML's
  temporary location, and cache it by model version.
- Validate memory and performance on physical target devices, especially the
  lowest-memory supported device. Check model load, first prediction, repeated
  predictions, background/foreground transitions, and low-memory behavior.

For Background Assets, make the asset pack locally available, resolve the model
URL, then load the compiled model with `MLModel.load(contentsOf:configuration:)`.

```swift
// Existing On-Demand Resources project
let request = NSBundleResourceRequest(tags: ["ml-model-v2"])
try await request.beginAccessingResources()
let modelURL = Bundle.main.url(forResource: "LargeModel", withExtension: "mlmodelc")!
let model = try await MLModel.load(contentsOf: modelURL, configuration: config)
// Call request.endAccessingResources() when done
```

## Memory Management

- **Unload on background:** Release model references when the app enters background
  to free GPU/ANE memory. Reload on foreground return.
- **Choose compute units by context:** use `.all` by default. Consider `.cpuOnly`
  only when profiling or app policy shows accelerator contention, thermal state,
  energy budget, deterministic testing, or a legitimate background execution
  constraint makes CPU the right tradeoff.
- **Share model instances:** Never create multiple `MLModel` instances from the same
  compiled model. Use an actor to provide shared access.
- **Monitor memory pressure:** Large models (>100 MB) can trigger memory warnings.
  Register for `UIApplication.didReceiveMemoryWarningNotification` and release
  cached models when under pressure.

See [references/coreml-swift-integration.md](references/coreml-swift-integration.md) for an actor-based model manager with
lifecycle-aware loading and cache eviction.

## Common Mistakes

**DON'T:** Load models on the main thread.
**DO:** Use `MLModel.load(contentsOf:configuration:)` async API or load on a background actor.
**Why:** Large models can take seconds to load, freezing the UI.

**DON'T:** Recompile `.mlpackage` to `.mlmodelc` on every app launch.
**DO:** Compile once with `MLModel.compileModel(at:)` and cache the compiled URL persistently.
**Why:** Compilation is expensive. Cache the `.mlmodelc` in Application Support.

**DON'T:** Hardcode `.cpuOnly` unless you have a specific reason.
**DO:** Use `.all` and let the system choose the optimal compute unit.
**Why:** `.all` enables Neural Engine and GPU, which are faster and more energy-efficient.

**DON'T:** Claim GPU or Neural Engine are categorically unavailable for all
background-adjacent work.
**DO:** Treat background execution as policy-, mode-, contention-, thermal-, and
energy-dependent, and profile the actual workload on device.
**Why:** Apps may be suspended, throttled, or limited by their background mode;
`.cpuOnly` is a tradeoff, not a universal requirement.

**DON'T:** Ignore `MLFeatureValue` type mismatches between input and model expectations.
**DO:** Match types exactly -- use `MLFeatureValue(pixelBuffer:)` for images, not raw data.
**Why:** Type mismatches cause cryptic runtime crashes or silent incorrect results.

**DON'T:** Create a new `MLModel` instance for every prediction.
**DO:** Load once and reuse. Use an actor to manage the model lifecycle.
**Why:** Model loading allocates significant memory and compute resources.

**DON'T:** Skip error handling for model loading and prediction.
**DO:** Catch errors and provide fallback behavior when the model fails.
**Why:** Models can fail to load on older devices or when resources are constrained.

**DON'T:** Assume all operations run on the Neural Engine.
**DO:** Use `MLComputePlan` (iOS 17.4+) to verify device dispatch per operation.
**Why:** Unsupported operations fall back to CPU, which may bottleneck the pipeline.

**DON'T:** Process images manually before passing to Vision + Core ML.
**DO:** Use `CoreMLRequest` (iOS 18+) or `VNCoreMLRequest` (legacy) to let Vision handle preprocessing.
**Why:** Vision handles orientation, scaling, and pixel format conversion correctly.

## Review Checklist

- [ ] Model loaded asynchronously (not blocking main thread)
- [ ] `MLModelConfiguration.computeUnits` set appropriately for use case
- [ ] Model instance reused across predictions (not recreated each time)
- [ ] Auto-generated class used when available (typed inputs/outputs)
- [ ] Error handling for model loading and prediction failures
- [ ] Compiled model cached persistently if compiled at runtime
- [ ] Image inputs use Vision pipeline (`CoreMLRequest` iOS 18+ or `VNCoreMLRequest`) for correct preprocessing
- [ ] `MLComputePlan` checked to verify compute device dispatch (iOS 17.4+)
- [ ] Batch predictions used when processing multiple inputs
- [ ] Model size appropriate for deployment strategy (bundle, Background Assets, ODR)
- [ ] Memory tested on target devices (especially older devices with less RAM)
- [ ] Predictions run outside debugger for accurate performance measurement

## References

- Patterns and code: [references/coreml-swift-integration.md](references/coreml-swift-integration.md)
- Model conversion and optimization (Python-side): covered in the `apple-on-device-ai` skill
- Apple docs: [Core ML](https://sosumi.ai/documentation/coreml) |
  [MLModel](https://sosumi.ai/documentation/coreml/mlmodel) |
  [MLTensor](https://sosumi.ai/documentation/coreml/mltensor) |
  [MLComputePlan](https://sosumi.ai/documentation/coreml/mlcomputeplan-1w21n) |
  [Background Assets](https://sosumi.ai/documentation/backgroundassets)

Source

Creator's repository · dpearson2699/swift-ios-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk