How to Keep Dev, Staging, and Prod Environments in Sync

by Alex Salerno

"Worked in staging, failed in production" is a pattern, not an accident. It happens because staging and production are configured separately, by different people, at different times, with no single artifact that says what's different and why. The values drift, and the drift surfaces as incidents.

Same problem, smaller blast radius: "worked on my machine, failed in CI." Same root cause. Same fix.

This post is about a low-overhead model for keeping environments in sync — without adopting a heavyweight config-management framework.

1. What "in sync" actually means

It does not mean "identical." A production environment will always differ from local: real secrets, real domains, real scale. The thing you actually want is structural parity — the set of variables, the shape of the deployment, the startup sequence are the same. Only the values change.

A useful test: can you write down, in one screen, every variable your application reads from its environment? If yes, you're in good shape regardless of where the values come from. If no — if it's spread across five .env.example files, three Terraform modules, and the build pipeline — you have drift waiting to happen.

2. Where drift comes from

The usual suspects:

  • Per-repo .env.example files that fall out of date because nobody owns them.
  • Environment-specific code paths (if NODE_ENV === 'production') that turn the environment into part of the program logic rather than an input.
  • Secrets management that's separate from config management. The secret rotation in prod doesn't get reflected in the .env.example developers copy from.
  • CI configuration that drifts independently. A pipeline that exports API_URL at run time instead of reading from the same definition the developer uses.
  • No single switch. Changing from local to staging requires editing N .env files in N repos, and nobody updates all of them.

Each is fixable in isolation. None of them get fixed when "fixing it" requires going through five different review processes.

3. The model: one definition, three values

Whatever tool you pick, the model that works is:

  • One file defines the variables your app reads. This is the source of truth for what variables exist.
  • One file per environment defines values. Local, staging, prod each get their own value sheet. Same keys, different values.
  • One command switches between them. Editing five .env files by hand is where drift comes from; one command that writes them from the source-of-truth is where it stops.

The shape is identical to twelve-factor's "store config in the environment," with one addition: a tool above the .env files that knows what each environment should contain.

4. Implementation in three takes

Take 1: Shell scripts

# scripts/env.sh
case "$1" in
  local)   API_URL=http://localhost:8080 ;;
  staging) API_URL=https://api.staging.acme.com ;;
  prod)    API_URL=https://api.acme.com ;;
esac
echo "API_URL=$API_URL" > .env

Works for one repo. Fails as soon as you have several repos, because you need to apply the same env switch to all of them. Now you're maintaining N scripts that have to agree.

Take 2: Per-environment files

repo-a/.env.local
repo-a/.env.staging
repo-a/.env.production
repo-b/.env.local
repo-b/.env.staging
repo-b/.env.production
...

Better. The values are explicit. The downside: a value that appears in every repo (API_URL) needs to be updated in N×3 files when it changes. The drift surface area gets larger, not smaller.

Take 3: A profile-level definition

A team-wide YAML file declares environments and variables at the fleet level. A small CLI reads it and writes the right .env to each repo when you switch:

# acme.raid.yaml
environments:
  - name: local
    variables:
      - { name: API_URL, value: http://localhost:8080 }
      - { name: LOG_LEVEL, value: debug }
  - name: staging
    variables:
      - { name: API_URL, value: https://api.staging.acme.com }
      - { name: LOG_LEVEL, value: info }
  - name: production
    variables:
      - { name: API_URL, value: https://api.acme.com }
      - { name: LOG_LEVEL, value: warn }
team env staging

That writes the staging values into every repo's .env. There's one source of truth. The PR that adds a new variable is the one that propagates it to every environment. Drift becomes structurally impossible.

For per-repo additions (like a service's own port), the repo gets its own small config that adds keys on top — but only in addition, never in conflict.

5. The secret problem

The above gets you structural parity. It does not get you secrets management — and you should not put real secrets in plain YAML, even encrypted. Pair the env switcher with whatever your team already uses for secrets:

  • Cloud KMS / SOPS / age-encrypted files for medium-sensitivity values.
  • Vault / 1Password / Doppler for proper secrets.
  • Kubernetes Secrets / SSM Parameter Store in deployed environments.

The model is: the env switcher writes everything except secrets to the .env file. Secrets get injected at runtime by your secrets tool. The variable exists in the definition (so devs know it's there), but its value comes from a different source.

6. CI is just another environment

Most teams treat CI as a separate problem from local development. It doesn't have to be. If team env staging is what writes a developer's .env file, the same command can run in CI to produce CI's .env. The bootstrap-the-environment artifact is the same. Drift between local and CI is impossible by construction.

If you're at the point where you're writing a separate "what CI exports" config, you've already broken parity. Pull it back into the same definition.

7. What this looks like in practice

I built Raid around exactly this model. The profile defines environments; raid env <name> writes per-repo .env files; the same command runs in CI as locally. The full how-to is in How to switch environments with raid env.

You don't have to use Raid. The point is that the source-of-truth shape — one definition, one switch command, same artifact local and CI — is what makes parity tractable. Pick any tool that fits that shape.

8. How you know it's working

Two signals:

  • The "what changed?" question after a prod incident has a precise answer. When something works in staging and fails in prod, the answer is in the env definition — go look at the file. If you can't say what differs from a single source, you don't have parity.
  • Adding a new variable is one PR. Touches the env definition, gets reviewed, propagates. If adding a new variable means "update the wiki, update five .env.example files, update the CI config, update Terraform" — you're a long way from parity.

Next steps

More articles

How to Add a Health Check to a Raid Workflow

Use the Raid `Wait` task to block on HTTP endpoints or TCP ports until a service is healthy — and pair it with `Group` for retry semantics on flaky deps.

Read more

How to Add a raid.yaml to an Existing Repo

Commit a raid.yaml to any repo so the Raid CLI can run its commands, environments, and install steps — and merge them with the team profile automatically.

Read more