Spot concurrency bugs in Kotlin code

Reviews Kotlin code for common structured-concurrency violations: leaked scopes, runBlocking in the wrong place, caught exceptions that swallow cancellation, orphaned coroutines.

Best for: Kotlin engineers shipping concurrent code who want a second set of eyes on scope management.

Engineering / code-reviewatomicfor-engineersno-setupfrom-file

Topics

androidcompose-multiplatformcoroutinesjetpack-composekotlinkotlin-multiplatformskills

Source

Creator's repository · chrisbanes/skills

View on GitHub

License: Apache-2.0

Skill file

Preview skill file
---
name: kotlin-coroutines-structured-concurrency
description: Use when writing or reviewing Kotlin code that stores CoroutineScope, launches from init/non-suspending APIs, calls runBlocking, or catches broad exceptions around suspend calls.
---

# Kotlin coroutines: structured concurrency

## Core principle

A well-structured coroutine is a self-contained unit of asynchronous work — single entry, single exit, scoped to a lifecycle known at the call site.

**Scopes should usually be tied to the caller's lifecycle, not stored as a property on the callee.** A stored `CoroutineScope` is a strong review signal: the class must prove it owns cancellation, error reporting, restart behavior, and lifecycle. Most repositories, managers, use cases, and data sources cannot prove that, so they should expose `suspend` APIs instead.

The fix is almost always the same: **make the API `suspend` and let the caller own the scope.**

## When to use this skill

You're writing or reviewing Kotlin code and you see any of these:

- A class with `private val scope: CoroutineScope` (constructor param stored as a property)
- An `init { scope.launch { ... } }` block
- A non-suspending public function whose body is `scope.launch { ... }`
- `runBlocking { ... }` in suspend-capable application code, or in tests where `runTest` should apply
- `runCatching { suspendCall() }` or a `catch` on `Exception` / `Throwable` around a `suspend` call without rethrowing `CancellationException`
- A `catch (e: CancellationException)` (or equivalent) around suspension that does not rethrow

## The silent-cancellation bug

The reason an unowned `CoroutineScope` property is so dangerous: **once a scope is cancelled, every future `launch` on it silently completes as cancelled — no exception, no log, nothing.** The work just doesn't happen. This is one of the hardest coroutine bugs to diagnose, and it appears when a class holds a long-lived reference to a lifecycle it does not own.

If APIs are `suspend`, this can't happen: the caller's scope is either alive (work runs) or the call site cancels (the caller knows).

## Anti-patterns and fixes

### 1. CoroutineScope stored as a property

```kotlin
// ❌ BAD
@Inject
class UserRepository(
    private val scope: CoroutineScope,
    private val api: UserApi,
) {
    fun refresh() {
        scope.launch { _state.value = api.fetchUser() }
    }
}

// ✅ GOOD
@Inject
class UserRepository(
    private val api: UserApi,
) {
    suspend fun refresh(): User = api.fetchUser()
}
```

The repository no longer needs to know about coroutines at all. The caller (a ViewModel, a use case) decides on what scope, with what error handling, with what cancellation semantics.

### 2. init-block launches

```kotlin
// ❌ BAD: construction-time side effect, unbounded work
class UserSession(private val scope: CoroutineScope, private val api: Api) {
    init { scope.launch { _user.value = api.load() } }
}
```

The constructor returns immediately. The caller can't `await` the load, can't see errors, can't cancel. The class is "alive" but its state is undefined.

```kotlin
// ✅ GOOD: explicit bootstrap, caller owns the suspension
class UserSession(private val api: Api) {
    private var _user: User? = null
    val user: User get() = checkNotNull(_user) { "Call init() first" }

    suspend fun init() { _user = api.load() }
}
```

### 3. Fire-and-forget from non-UI classes

A non-suspending public function on a **non-UI class** (repository, manager, use case, data source) that launches into a class-owned scope. The caller gets no result, no error, no cancellation, and no guarantee the work ever ran.

```kotlin
// ❌ BAD — repository with stored scope and fire-and-forget public API
class AnalyticsClient(private val scope: CoroutineScope, private val api: Api) {
    fun track(event: Event) {
        scope.launch { api.send(event) }      // caller has no idea what happens
    }
    fun signOut() {
        scope.launch { api.signOut() }        // silent failure if scope cancelled
    }
}
```

```kotlin
// ✅ GOOD
class AnalyticsClient(private val api: Api) {
    suspend fun track(event: Event) = api.send(event)
    suspend fun signOut() = api.signOut()
}
```

#### Carve-out: the UI ↔ state-holder boundary

UI frameworks are non-suspending. A Composable's `onClick`, a Fragment's `onKeyEvent`, an Activity's `onNewIntent` — none can `suspend`. The state holder (ViewModel, Decompose Component, feature model, etc. — anything whose role is to absorb UI events and hold UI state) **is** the boundary that translates one-shot UI events into asynchronous work bound to the UI lifecycle. That's its job.

```kotlin
// ✅ GOOD — state holder absorbs a non-suspending UI event onto its scope
class FavouritesViewModel(private val repo: FavouritesRepository) : ViewModel() {
    fun onToggleFavourite(item: Item) {
        viewModelScope.launch { repo.toggleFavourite(item) }
    }
}

// in Compose:
ListItem(onClick = { viewModel.onToggleFavourite(item) })
```

This is **not** the fire-and-forget anti-pattern. All three conditions must hold:

1. **State holder for a UI surface** — a ViewModel, Decompose Component, feature model, or equivalent UI state holder. Not a repository, manager, use case, or data source.
2. **Lifecycle-bound scope** — `viewModelScope`, a Component's `coroutineScope` that's cancelled on destroy, a Composable's `rememberCoroutineScope()`. Not `AppScope`, not an injected long-lived scope, not an ad-hoc `CoroutineScope(...)`.
3. **Caller really is a UI event** — Composable callback, key handler, lifecycle hook. Not another business-logic class calling through the state holder.

The repository / use case / data source layers underneath still expose `suspend` APIs. The state holder is the *only* layer where the non-suspending → suspending translation belongs.

"It feels like a state holder" isn't enough. The question is "does the UI directly bind to this?" If no, the carve-out doesn't apply.

### 4. Stored scopes that aren't injected

The same anti-pattern, without an injected scope:

```kotlin
// ❌ BAD — same problem, scope is constructed in-class instead of injected
class FooManager {
    private val scope = MainScope()
    private val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
}
```

Lifecycle is now owned by nothing and lives forever. Replace with `suspend` APIs.

The same is true if the instantiation is nested inside a function body — `fun foo() { CoroutineScope(...).launch { … } }` is just a stored scope with extra steps. Each call leaks a new uncancellable scope; bundling it into a `by lazy` property doesn't fix the underlying issue (the scope shouldn't exist at all).

### 5. DI-bound singletons / initializers that launch

A specific pattern that is hard to spot: a DI-bound class (`@SingleIn(AppScope)`, `@Singleton`, an `Initializer.initialize()`) launches a coroutine from its constructor / `init` block / `initialize()`. The launched work then has:

- **A non-deterministic start time** — whenever the graph realizes the binding. Cold-start ordering is invisible.
- **No observable lifecycle.** Nothing else in the codebase can see whether it's running or has crashed.
- **No `stop()` / restart path.** If upstream enters a bad state, the loop is uncancellable.
- **No calling code to grep for.** Readers can't find "who starts this and when".

§1 says scopes should be tied to the caller's lifecycle. The DI-bound variant violates this indirectly: the *scope* may be injected, but the *launch* is hidden inside construction — same effect, harder to see.

```kotlin
// ❌ BAD — singleton boots work as a side effect of being constructed
@SingleIn(AppScope::class)
@Inject
class TokenRefresher(
    @ForScope(AppScope::class) private val scope: CoroutineScope,
    private val auth: AuthService,
) {
    init {
        scope.launch {
            while (isActive) {
                delay(5.minutes)
                auth.refreshIfNeeded()
            }
        }
    }
}

// ❌ ALSO BAD — Initializer.initialize() that *launches*, not just registers
class TokenInvalidatorInitializer @Inject constructor(
    @ForScope(AppScope::class) private val scope: CoroutineScope,
    private val store: AuthStore,
    private val invalidator: TokenInvalidator,
) : Initializer {
    override fun initialize() {
        scope.launch { store.tokenChanges.collect { invalidator.invalidate() } }
    }
}
```

Both look like "application-scoped singletons", but the **When NOT to apply** carve-out is *not* permission to launch from `init` / `initialize()`. It's permission for a singleton to own a scope when its API is suspending.

#### First ask: does this background-loop class need to exist at all?

Most background-loop classes exist only because no one inverted the observation. Three answers, in order of preference:

**Pattern 1 — invert into the consumer.** The class observes state forever to react when it changes. But *someone* mutates the state — sign-out flow, profile switch, flag-update handler. That mutation site is already in a coroutine context and is the natural place to do the work directly.

```kotlin
// ✅ GOOD — no background loop, no scope, no class. The mutation site does the work.
class Authenticator(
    private val authStore: AuthStore,
    private val tokenInvalidator: TokenInvalidator,
) {
    suspend fun signOut() {
        authStore.clearTokens()
        tokenInvalidator.invalidate()   // direct call at the mutation site
    }
}
```

The background-loop class is **deleted**. The work happens where the state changes.

When this applies: the consumer of the state has a clear lifecycle (a use case, an Authenticator, a service handler) and can perform the reaction inline.

**Pattern 2 — scheduled work.** Genuinely periodic or deferred. Use WorkManager / BGTaskScheduler. The enqueue is one-shot; make it suspending and call it once from an orchestrator that already runs at startup.

**Pattern 3 — explicit named launch site.** Sometimes the consumer is a synchronous API with no observable lifecycle (e.g., OpenTelemetry's `Sampler.shouldSample(...)`, an AIDL stub fanout, a broadcast receiver bridge). The observation has to live somewhere coroutine-aware, but it must live at an *explicit named call site* — not in the class's own `init`.

```kotlin
// ✅ GOOD — work is named; an explicit call site owns the launch
@SingleIn(AppScope::class)
class OtelConfigurableSampler(...) : Sampler {
    @Volatile private var delegate: Sampler = ...

    suspend fun observeRate(featureFlags: FeatureFlags) {
        featureFlags.observe(OTEL_SAMPLING_RATE).collect { rate ->
            delegate = Sampler.traceIdRatioBased(rate.coerceIn(0.0, 1.0))
        }
    }

    override fun shouldSample(...) = delegate.shouldSample(...)
}

// wired explicitly at the OTel SDK init module:
applicationScope.launch { otelSampler.observeRate(featureFlags) }
```

When this applies: the consumer is a synchronous API that calls *into* you with no observable lifecycle. The launch can't be invertible, but it must still be visible at a named call site.

#### Test for which pattern fits

"Is the consumer's lifecycle observable to me?"

- **Yes, and they're already in a coroutine context** → Pattern 1. Push the subscription into them; delete the background-loop class.
- **The work is periodic / deferred** → Pattern 2. Suspend enqueue called once.
- **No, they're a synchronous API with no observable lifecycle** → Pattern 3. Explicit launch site, not `init`.

If a fourth answer seems to fit — e.g., "I want a `Bootable` interface that launches everything for me" — that's the same anti-pattern with an extra layer of abstraction. The whole point is that launches be *visible*; auto-discovery by interface defeats it.

#### Initializers are still fine — *if they only register*

The `Initializer` pattern is correct when `initialize()` *registers* a listener or hook. The bug is when `initialize()` *launches* a coroutine.

```kotlin
// ✅ GOOD Initializer — registers a contributor, doesn't launch
class FavouritesContributorInitializer @Inject constructor(
    private val registry: ContributorRegistry,
    private val favouritesContributor: FavouritesContributor,
) : Initializer {
    override fun initialize() {
        registry.register(favouritesContributor)
    }
}
```

**`Initializer.initialize()` must not `launch` a coroutine.** If yours does, it's a Pattern 1/2/3 candidate.

#### Diagnostic for review

- Where is the start moment defined? If "wherever DI realizes me", bad.
- Who can observe whether the work is running? If "no one", bad.
- Who can stop or restart it? If "no one", bad.
- Can a reader grep for the launch site? If no, bad.

If the answers are "the consumer / the orchestrator / the named call site" — you're good.

### 6. Swallowing `CancellationException`

A `catch` clause around a `suspend` call that matches `CancellationException` — directly, or through `Exception` / `Throwable` — and doesn't rethrow usually turns cancellation into silent success. The parent coroutine thinks the child finished; the child keeps running (or its side effects do); the cancellation contract is broken.

Same failure shape as §1's stored-scope bug, viewed from the other end: §1 hides the work *from* the caller's lifecycle; this hides cancellation *from* the work.

```kotlin
// ❌ BAD — catches CancellationException, never rethrows
suspend fun fetch() {
    try {
        api.load()
    } catch (e: Exception) {           // matches CancellationException too
        logger.warn("load failed", e)
    }
}

// ❌ ALSO BAD — runCatching has the same problem
suspend fun fetch() {
    runCatching { api.load() }
        .onFailure { logger.warn("load failed", it) }
}
```

The acceptable shapes:

```kotlin
// ✅ Separate catch first
try { api.load() }
catch (e: CancellationException) { throw e }
catch (e: Exception) { logger.warn("load failed", e) }

// ✅ Conditional rethrow inside the broad catch
try { api.load() }
catch (e: Exception) {
    if (e is CancellationException) throw e
    logger.warn("load failed", e)
}

// ✅ ensureActive() — good when the catch handles ordinary failures and you only need
// to rethrow if the current coroutine is cancelled
try { api.load() }
catch (e: Exception) {
    currentCoroutineContext().ensureActive()
    logger.warn("load failed", e)
}

// ✅ runCatching with explicit guard
runCatching { api.load() }
    .onFailure {
        if (it is CancellationException) throw it
        logger.warn("load failed", it)
    }

// ✅ runCatching terminated with getOrThrow (cancellation flows back out)
runCatching { api.load() }.getOrThrow()
```

The trigger is "a suspend call inside the `try`", not "the enclosing function is declared `suspend`". This applies inside any suspending body — `suspend fun`, a `launch { … }` lambda, a Flow `collect { … }`, etc.

The common carve-out is an intentionally local timeout: catching `TimeoutCancellationException` from your own `withTimeout` and converting it to a domain result can be correct. Keep that catch narrow and close to the timeout. Do not use it as permission to swallow arbitrary cancellation.

Catching a non-cancellation subtype (`IOException`, your own exception types) is fine — they don't extend `CancellationException`.

### 7. `runBlocking`

`runBlocking` parks the current thread until the lambda finishes. Inside suspend-capable or lifecycle-scoped application paths it is wrong: a thread that meant to be async is now blocked, structured concurrency is broken, and any cancellation upstream has no effect. It is the "callee makes a structural decision for the caller" anti-pattern at its most direct.

```kotlin
// ❌ BAD — bridging to suspend by blocking the calling thread
fun saveUser(user: User) {
    runBlocking { repository.save(user) }
}
```

Three fixes, by context:

**Suspend-capable application code** — make the function `suspend`:

```kotlin
// ✅ GOOD
suspend fun saveUser(user: User) = repository.save(user)
```

If the immediate caller can't suspend either (a non-suspending UI callback, a `BroadcastReceiver` hook), use the existing lifecycle-bound scope at the boundary — see §3's UI ↔ state-holder carve-out. The fix is at the boundary, not inside `saveUser`.

Legitimate blocking boundaries exist: `main` in a CLI tool, Java interop APIs that must return synchronously, framework callbacks with no suspending alternative, and migration shims. Keep `runBlocking` at that outer boundary, keep the body small, and call suspending code immediately.

**Tests** — use `runTest`:

```kotlin
// ❌ BAD — real time, slow tests, no virtual delay
@Test fun loadsUser() = runBlocking {
    assertThat(repository.load().name).isEqualTo("Alice")
}

// ✅ GOOD
@Test fun loadsUser() = runTest {
    assertThat(repository.load().name).isEqualTo("Alice")
}
```

`runTest` gives you virtual time (`delay()` returns immediately), `TestDispatcher` integration, and proper coroutine cleanup. Real-time `runBlocking` in tests makes them slow and flaky.

**`ContentProvider` carve-out** — Android's `ContentProvider` methods (`query`, `insert`, `update`, `delete`, `onCreate`, `call`) are synchronous from outside the process. There is no way to suspend them. Inside *member functions* of a `ContentProvider` subclass (direct or indirect — not companion objects), `runBlocking` is the unavoidable bridge. Keep the body as short as possible and call into suspending code immediately:

```kotlin
// ✅ Acceptable in ContentProvider members only
class MyProvider : ContentProvider() {
    override fun query(...): Cursor? = runBlocking { dao.query(...) }
}
```

This carve-out is for `android.content.ContentProvider` subclasses *only*. "It's like a `ContentProvider`" doesn't apply, and a `runBlocking` in a `ContentProvider`'s companion object is still a regular violation — the helper isn't part of the framework's synchronous surface.

## Quick reference

| Symptom | Anti-pattern | Fix |
|---|---|---|
| Class has `private val scope: CoroutineScope` | Stored scope on the callee | Remove. Make public APIs `suspend`. |
| `init { scope.launch { ... } }` | Construction-time launch | Move to `suspend fun init()` / `login()` |
| `fun foo() { scope.launch { ... } }` on a repository/manager/use case | Fire-and-forget from non-UI class | `suspend fun foo()`, let UI state holder pick the scope |
| `fun onClick() { viewModelScope.launch { ... } }` on a state holder, called from UI | UI ↔ state-holder boundary — fine | Keep as-is (see §3 carve-out) |
| `private val scope = MainScope()` | Internally-constructed stored scope | Same — remove, make APIs `suspend` |
| `@SingleIn(AppScope) class X(scope) { init { scope.launch { … } } }` | DI-bound opaque launch (§5) | Expose `suspend fun run()`, launch from startup orchestrator |
| `class Y : Initializer { override fun initialize() { scope.launch { … } } }` | Initializer that launches, not registers (§5) | Same — `suspend fun run()`, orchestrator owns lifecycle |
| `try { suspendCall() } catch (e: Exception\|Throwable\|CancellationException) { … }` with no rethrow | Swallowed cancellation (§6) | Prefer `catch (e: CancellationException) { throw e }`; use `ensureActive()` only when that matches the intent |
| `runCatching { suspendCall() }.onFailure { … }` with no cancellation guard | Same shape as above (§6) | Add `if (it is CancellationException) throw it`, or terminate with `.getOrThrow()` |
| `runBlocking { … }` inside suspend-capable app code | Thread-blocking bridge (§7) | Make caller `suspend`; or use a lifecycle scope at the boundary |
| `runBlocking { … }` in a test | Same — real-time bridging (§7) | Use `runTest { … }` |
| `runBlocking { … }` inside a `ContentProvider.query`/`insert`/… member | Carve-out (§7) | Acceptable; keep the body minimal |

## Refactoring guidance

Removing an existing offender:

1. **Start at the leaf.** Pick the class farthest from any UI — usually a repository or data source. Its public surface should be the easiest to convert.
2. **Convert public functions to `suspend`** one at a time. The compiler will surface every caller.
3. **At each caller, choose the scope deliberately:** `viewModelScope`, `lifecycleScope`, `coroutineScope { }`, or an explicit job. This is the choice that was missing before.
4. **Delete the `CoroutineScope` constructor parameter** once nothing uses it. Remove the injection binding.

Don't try to fix every class in one MR. Removing an anti-pattern is incremental work.

## When NOT to apply

- **UI state holders absorbing UI events.** A ViewModel/Component/feature model with `fun onClick(...) { viewModelScope.launch { ... } }` is correct — that's the boundary the framework needs. See §3 carve-out.
- **Lifecycle owners with explicit cancellation and error policy.** Actors/services, app infrastructure, or application-scoped singletons may own a scope when they expose clear `close`/`cancel`/restart behavior or otherwise map directly to an application lifecycle. Inject `Application.applicationScope` explicitly rather than creating one ad-hoc. **This is not permission to launch from `init` / `initialize()`** — see §5.
- **Already-suspending APIs** don't need any of this work.
- **Tests** sometimes use `TestScope` as a deliberate ambient scope — that's a different pattern with explicit virtual-time control.

## Red flags during review

These thoughts mean the anti-pattern is back:

| Thought | Reality |
|---|---|
| "I'll just add a `CoroutineExceptionHandler` to the scope" | The problem isn't error handling. The problem is the scope shouldn't exist. |
| "I need to launch from `init` so the data's ready when consumers arrive" | Consumers reading state that isn't ready is the bug. Use phasing. |
| "The caller doesn't want to deal with `suspend`" | Then the caller chooses fire-and-forget at their scope. Don't decide for them. |
| "It's just a small fire-and-forget call" | Silent cancellation makes every fire-and-forget a potential silent failure. |
| "We caught and logged the exception, so we're fine" | Did the catch rethrow `CancellationException`? If no, the coroutine is silently un-cancelled. (§6) |
| "It's just one `runBlocking`, in a non-critical path" | Every `runBlocking` asserts the caller has no async option. If they do, it's the wrong primitive. (§7) |
| "Tests are simpler with `runBlocking`" | They run in real time, can't fast-forward `delay`, and lose `TestDispatcher` semantics. Use `runTest`. (§7) |

## Related

- [`kotlin-flow-state-event-modeling`](../kotlin-flow-state-event-modeling/SKILL.md) — `StateFlow`, `SharedFlow`, `Channel`, `stateIn`, one-shot events, and related modeling.