Test Automation Playbook — Retrofitting a Legacy Next.js Codebase with CI/CD Safety Nets

In a previous post, I shared the retrospective on building a CI/CD safety net for our Next.js financial platform. That post was the what and the why. This one is the how — a phased playbook you can adapt to your own legacy codebase.

The starting point was grim:

// next.config.js — the state of things when I started
ignoreBuildErrors: true,
ignoreDuringBuilds: true,

Every PR check had continue-on-error: true. Zero test steps in the pipeline. Real money flowing through the app.

Here’s how I fixed it — in 6 phases, without breaking production.

Phase 0: Measure Before You Move

Don’t start fixing anything. Start by understanding what you have.

I ran jest --coverage for the first time and got the baseline:

Statements: 60.45%
Branches:   47.25%
Functions:  51.06%
Lines:      61.42%

Not as bad as I expected. The previous developer had written some tests — they just weren’t wired into anything. Nobody ran them, nobody enforced them.

I also audited the CI/CD pipeline end-to-end: pre-commit hooks (none), PR checks (all non-blocking), build pipeline (no test step), post-deploy verification (none). Drew the gap map.

The lesson: Measure first. You need a baseline to prove improvement later, and the audit often reveals that the problem isn’t “no tests” — it’s “tests exist but nothing enforces them.”

Phase 1: Stop the Bleeding

The cheapest wins come from enforcing what already exists.

Make PR checks blocking

I removed continue-on-error: true from every job in pr-checks.yml. That’s it. One-line changes. Suddenly lint failures, type errors, and test failures actually blocked merges.

Add coverage thresholds

I set thresholds slightly below the current baseline — not aspirational numbers, just a floor that prevents regression:

// jest.config.js
coverageThreshold: {
  global: {
    branches: 45,
    functions: 49,
    lines: 59,
    statements: 58,
  },
},

The strategy is called ratcheting: set the floor at current reality, then raise it by ~5% per sprint as you write more tests. Never set coverage targets you can’t maintain.

Address flaky tests

I found 3 tests that failed intermittently due to timer-dependent assertions. Fixed them before they eroded trust in the pipeline. A flaky test that people learn to ignore is worse than no test.

Time spent: ~2 days. Impact: Immediate. Every PR now had a real gate.

Phase 2: Strengthen the PR Gate

With blocking checks in place, I made them faster and more comprehensive.

Lint-staged pre-commit

Instead of linting the entire project on every commit, lint only what changed:

# .husky/pre-commit
set -e

# Lint + format staged files
npx lint-staged

# Run unit tests related to staged files
FILES=$(git diff --cached --name-only --diff-filter=ACMR)
if [ -n "$FILES" ]; then
  echo "$FILES" | tr '\n' '\0' | \
    xargs -0 npx jest --bail --passWithNoTests --findRelatedTests
fi

This runs ESLint + Prettier on staged files, then Jest’s --findRelatedTests to catch breakage in code that imports the changed files. Fast feedback without running the full suite.

Parallel PR jobs

I split the single PR check into 4 parallel jobs with a shared install cache:

# pr-checks.yml — 5 jobs, parallelized
jobs:
  install:    # Shared node_modules cache
  lint:       # ESLint (needs: install)
  typecheck:  # tsc --noEmit (needs: install)
  test:       # Jest with coverage (needs: install)
  visual:     # Playwright visual regression (opt-in via label)

PR feedback time dropped from ~4 minutes to ~2 minutes. The visual job runs only when you add a visual-check label — visual regression diffs are noisy and shouldn’t block every PR.

Phase 3: CI/CD Pipeline Gates

PR checks verify code quality. Pipeline gates verify the built artifact works.

Add test job to CI/CD

The deploy pipeline (ci-cd.yml) had: build Docker image → push to ACR → update ArgoCD manifest. I inserted a test job between “checkout” and “build”:

test:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: 24
    - run: npm ci
    - run: npm test -- --passWithNoTests --maxWorkers=2 --forceExit --coverage

If tests fail, the Docker image never gets built. Simple.

Health check endpoint

I added a /healthz route that K8s probes hit. Basic mode returns { status: "ok" }. Deep mode validates critical environment variables:

// src/app/healthz/route.ts
export function GET(request: Request) {
  const url = new URL(request.url);
  const deep = url.searchParams.get("deep");

  const health: Record<string, unknown> = {
    status: "ok",
    timestamp: new Date().toISOString(),
  };

  if (deep === "true") {
    const requiredVars = [
      "NEXT_PUBLIC_API_URL",
      "NEXT_PUBLIC_KEYCLOAK_URL",
      "NEXT_PUBLIC_KEYCLOAK_REALM",
    ];
    const missingCount = requiredVars.filter((v) => !process.env[v]).length;
    if (missingCount > 0) {
      return NextResponse.json(
        { status: "degraded", missingCount },
        { status: 503 },
      );
    }
  }

  return NextResponse.json(health);
}

This serves three consumers:

K8s liveness probe: GET /healthz — is the process alive?
K8s readiness probe: GET /healthz?deep=true — are env vars present?
CI/CD pipeline: health check before running E2E smoke tests

E2E smoke tests in CI

This was the hardest part. Getting Playwright to run against a standalone Next.js build in GitHub Actions, authenticated against a real Keycloak instance, took multiple iterations.

The key design decision: dual-mode authentication. In CI, use Keycloak’s Direct Access Grant (REST-only, ~100ms). Locally, use the browser auth code flow:

// e2e/auth/global-setup.ts — simplified
if (process.env.E2E_KC_CLIENT_SECRET) {
  // CI mode: Direct Access Grant (no browser needed)
  const tokens = await fetch(tokenEndpoint, {
    method: "POST",
    body: new URLSearchParams({
      grant_type: "password",
      client_id: process.env.E2E_KC_CLIENT_ID,
      client_secret: process.env.E2E_KC_CLIENT_SECRET,
      username: process.env.E2E_TEST_USERNAME,
      password: process.env.E2E_TEST_PASSWORD,
    }),
  });
} else {
  // Local mode: browser flow with redirect capture
  const browser = await chromium.launch();
  // ... navigate to Keycloak login page, fill credentials, capture tokens
}

The Playwright config handles both modes too — in CI it starts the standalone server; locally it reuses your running dev server:

// playwright.config.ts
webServer: process.env.PLAYWRIGHT_BASE_URL
  ? undefined  // External URL provided (e.g., deployed env)
  : {
      command: process.env.CI
        ? 'HOSTNAME=localhost PORT=3000 node .next/standalone/server.js'
        : 'npm run dev',
      reuseExistingServer: !process.env.CI,
    },

The test that catches real bugs

My favorite test intercepts the app’s own menu API response and verifies every path resolves:

// e2e/smoke/menu-route-validation.spec.ts — core logic
test('every backend menu path resolves to a frontend route', async ({ authenticatedPage }) => {
  // 1. Navigate to app, let it fetch its menu from the backend
  // 2. Intercept the /access-context/full/{locale} response
  // 3. Extract all menuViewPath values
  // 4. For each path: goto and assert status < 400
});

This catches the exact class of bug where backend says /payment/payout/ledger/ but frontend has /payment/payout-ledger/. We’d already been bitten by this — a production 404 nobody noticed because nobody tested the full menu.

Phase 4: Post-Deploy Verification

K8s probes are the last line of defense. If the app starts but env vars are missing, readiness probes fail and the pod never receives traffic:

startupProbe:    GET /healthz         (30s grace, 10 retries)
readinessProbe:  GET /healthz?deep=true (every 10s)
livenessProbe:   GET /healthz         (every 15s)

I also evaluated an ArgoCD PostSync hook for deployment notifications. After investigating, I found zero notification infrastructure in the cluster — no ArgoCD Notifications controller, no Slack channel, no Teams webhook. I closed the story as unnecessary.

Knowing when NOT to build something is as important as building it. Every piece of infrastructure you add is a piece you maintain.

Phase 5: Remove the Training Wheels

This was the whole point. Every previous phase existed to make this moment safe.

ESLint hardening

I audited and fixed ESLint rules: no-unused-vars cleanup across the codebase, then no-explicit-any triage (some any types are legitimate escape hatches; most aren’t). Then I removed ignoreDuringBuilds: true.

The 313 TypeScript errors

With ignoreBuildErrors: true removed, 313 errors appeared. I broke them into surgical PRs:

MUI palette type augmentation — extend the theme’s color palette types
Unused Shadcn module cleanup — remove dead component imports
React 19 JSX namespace migration — JSX.Element → React.JSX.Element
313 unknown → proper types — the big one, 86 files, +716/-386 lines

Each sub-task was its own PR. This was deliberate: if any fix introduced a regression, I could pinpoint and revert exactly which category of change caused it.

After the last PR merged, npm run build passed clean. No flags. No bypasses.

The Decision Framework

If you’re staring at a similar codebase, here’s how to think about sequencing:

Phase	Question	Effort	Risk
0. Measure	What do I actually have?	Hours	None
1. Stop bleeding	Can I make existing checks blocking?	Days	Low
2. Strengthen gates	Can I make PRs faster and more thorough?	Days	Low
3. Pipeline gates	Does the built artifact work?	Week	Medium
4. Post-deploy	Does the deployed artifact work?	Days	Low
5. Remove bypasses	Can the build stand on its own?	Week	High

The order matters. Phase 5 is high-risk — you’re removing safety nets. But by then, you have 4 other phases catching problems before they reach production.

Never start with Phase 5. The temptation is to “just remove ignoreBuildErrors” and fix what breaks. But without the pipeline to catch regressions, you’re fixing errors while potentially introducing new ones.

What I’d Do Differently

Visual regression from day one was premature. I spent days fighting font rendering diffs between macOS and Linux. For a solo developer, visual regression is opt-in at best. I eventually made it label-triggered, which is where I should have started.
I should have caught the silent skip sooner. My menu-route-validation test was test.skip()-ing in CI for weeks because of a missing env var. A test that silently skips is worse than no test. Now I emit ::warning:: GitHub annotations when tests skip so they show up in the PR summary.
Ratcheting thresholds should be automated. I update coverage thresholds manually each sprint. A bot that proposes a PR bumping thresholds to current-minus-2% would remove the friction.

Results

After 6 epics, 31 stories, 14 PRs:

Coverage floor enforced with ratcheting thresholds
Pre-commit hooks catch lint + related test failures before push
PR checks run lint, typecheck, and tests in parallel (~2 min)
CI/CD pipeline runs tests before building Docker images
E2E smoke tests verify the built container against real Keycloak
K8s probes gate traffic until the app is healthy
Build bypasses removed — 313 type errors fixed, both flags off
0 production incidents during the entire migration

The codebase isn’t perfect. Coverage is at 59%, not 80%. There are still any types I chose to keep. But the pipeline catches regressions, the build is honest, and I can deploy with confidence.

That’s the goal. Not perfection — confidence.

This is the companion post to How I Built a Full CI/CD Safety Net for a Financial Platform, which covers the retrospective and lessons learned. Built with Next.js 15, Jest, Playwright, GitHub Actions, and a healthy respect for legacy code.

A Playbook for Retrofitting Test Automation into a Legacy Next.js Codebase