How to Debug a Failing Raid Task

by Alex Salerno

When a raid command fails, the first instinct is usually to scroll up looking for a stack trace. Raid is more disciplined than that: every failure carries a category, an error code, a message, and often a hint. This guide walks through reading them, common categories of failure, and a few flags that help isolate the cause.

1. Read the exit code first

Raid maps failures to five categorical exit codes. They tell you what kind of thing went wrong without reading any logs:

Exit codeCategoryTypical cause
1GenericUnclassified — fall-through; treat as bug report material
2ConfigInvalid profile / repo YAML / argument
3TaskA user task failed (shell exit non-zero, prompt missing default, wait timeout)
4NetworkClone failed, HTTP task failed
5Not FoundProfile / env / command / repo / file missing

Print $? (POSIX) / $LASTEXITCODE (PowerShell) right after the failed run to see it. The category alone narrows the cause to one of five buckets.

2. Decode the structured error

Raid surfaces a typed error object. In text mode it prints a prose line:

Error: profile 'web-platform' not found (PROFILE_NOT_FOUND)
Hint: use 'raid profile list' to see available profiles

In --json mode it emits a stable envelope:

{
  "error": {
    "code": "PROFILE_NOT_FOUND",
    "category": "not-found",
    "message": "profile 'web-platform' not found",
    "hint": "use 'raid profile list' to see available profiles"
  }
}

code is the most stable identifier — assert on it in scripts. hint is the fastest path to the fix.

3. Common failure codes and how to react

CodeCategoryWhere it comes fromLikely fix
PROFILE_INVALIDConfigProfile file failed schema validationCheck the # yaml-language-server errors in your editor
PROFILE_NOT_ACTIVENot FoundNo profile is activeraid profile list then raid profile <name>
PROFILE_NOT_FOUNDNot FoundNamed profile isn't registeredraid profile list to confirm; re-add if missing
REPO_NOT_CLONEDNot FoundCommand needs a repo that's not on diskraid install or raid install <repo>
COMMAND_NOT_FOUNDNot Foundraid foo and no foo command existsraid --help for the actual list
ENV_NOT_FOUNDNot Foundraid env staging but no staging env definedraid env list to confirm naming
TASK_SHELL_FAILEDTaskA Shell task exited non-zeroRead the shell's own stderr; that's where the real cause is
TASK_WAIT_TIMEOUTTaskWait task exhausted its timeout:The dependency isn't coming up — check it directly
HEADLESS_PROMPT_NO_DEFAULTTaskHeadless mode hit a Prompt without a default:Add a default: or set the var via env before running
CLONE_FAILEDNetworkGit couldn't clone a repoSSH key, repo URL, or network
TASK_HTTP_FAILEDNetworkHTTP task got a non-2xx or couldn't reach the URLURL right? Auth right? Service up?
SCHEMA_VALIDATION_FAILEDConfigYAML doesn't match the published schemaEditor LSP usually pinpoints the line

The full list is more — but these cover the majority of day-to-day failures.

4. Inspect the workspace state

When the cause isn't immediately obvious from the error, dump the current state:

raid context --json | jq .

That returns active profile, active env, every repo with its git state (branch, dirty?), and recently-run commands. Half the "why doesn't this work" failures resolve here:

  • The repo isn't cloned (repos[*].cloned is false).
  • It's on the wrong branch (repos[*].branch).
  • Local edits are blocking a checkout (repos[*].dirty is true).
  • The env isn't what you thought it was.

For agent-driven debug, raid context serve gives the same data over MCP — see How to use Raid as an MCP server.

5. Re-run with more output

A failing Shell task already passes its stderr through. If you need more from Raid itself:

  • --json — emits the error envelope verbatim, including all detail fields (not all of which print in text mode).
  • Re-run the failing shell directly. Find the cmd: in the YAML, change into the repo, and run it. If the bug reproduces, it's not Raid — it's the underlying tool.
  • Run a single task in isolation. Comment out the surrounding tasks in the command and re-run. Once you've isolated the failing task, the cause is usually obvious.

6. Tolerate expected failures

Sometimes a step is allowed to fail — a clean-up command that may or may not have anything to clean, a probe that's informational. Mark the task continueOnFailure: true so the command keeps going:

- type: Shell
  cmd: rm -rf ./.cache
  options:
    continueOnFailure: true

The task's failure is logged but doesn't abort the command. The command's overall exit code is still 0 if everything else succeeded.

Don't use this to paper over real bugs — but for genuinely best-effort steps, it's the right tool.

7. Headless / CI debugging

A few flags help when you're staring at a CI log instead of a live terminal:

  • --json — easier to grep than prose.
  • RAID_NO_PREFIX=1 — drops the [task-name] prefix on concurrent output so plain logs are easier to read.
  • Bisect via --threads 1 — for raid install, reduce clone parallelism to 1 to make logs strictly sequential.

8. When it's actually a Raid bug

If you've checked the code, the error message is wrong or misleading, and raid context shows expected state, you've probably hit a Raid bug. Open an issue at github.com/8bitAlex/raid/issues with the error code, the exit code, and a minimal raid.yaml that reproduces it.

Next steps

More articles

How to Add a Health Check to a Raid Workflow

Use the Raid `Wait` task to block on HTTP endpoints or TCP ports until a service is healthy — and pair it with `Group` for retry semantics on flaky deps.

Read more

How to Add a raid.yaml to an Existing Repo

Commit a raid.yaml to any repo so the Raid CLI can run its commands, environments, and install steps — and merge them with the team profile automatically.

Read more