We run an internal automation platform that handles sales, marketing, SEO, and HR workflows through n8n. A few weeks ago we started asking whether the same infrastructure could handle software development tasks. The question was simple: could a message in a hub workflow trigger a pipeline that reads the task, gets human approval, has an AI agent write the code, open a PR, and get a code review — all without anyone touching a keyboard?
The short answer is yes. The longer answer is that the path from "this should work" to "this actually works" involved five non-obvious bugs that are worth documenting.
The Architecture
The pipeline has six stages: Hub routing → task normalization → human approval gate → Developer Step → PR creation → Gemini review gate.
The hub receives a natural language message ("Add error handling to the checkout route"), classifies it as a software task, and passes it to the pipeline. An approval card is created in our dashboard. A human approves it. The Developer Step runs Claude Code against the codebase, creates a branch, commits the changes, and opens a pull request. Gemini then reviews the PR diff and either passes it or sends it back to Claude Code with instructions to fix.
The retry loop matters: on Gemini FAIL, we increment a retry counter, prepend the review feedback to the task description, and restart from the Developer Step. After two failed attempts, the pipeline posts a "review_failed" notification to the dashboard and stops. The PR stays open — the human decides whether to close it.
Decoupling Claude Code from the HTTP Lifecycle
The first design decision that turned out to be critical: Claude Code cannot run synchronously inside an HTTP request handler.
Our dashboard is a Next.js app. The first version of the Developer Step made an HTTP request to a dashboard route that spawned Claude Code inline and waited for it to finish before returning. Claude Code takes 3–15 minutes. Node.js HTTP servers do not handle that gracefully — the second sequential invocation would crash the server or hang indefinitely.
The fix is an async job queue. The dashboard's /api/software-task/submit route inserts a row into a software_jobs Postgres table and immediately spawns a detached Node.js worker process using spawn(..., {detached: true, stdio: 'ignore'}) followed by proc.unref(). The route returns a job ID within milliseconds. The worker runs Claude Code, parses the branch name from stdout, and updates the job row to success or failed when it finishes.
The Developer Step in n8n polls /api/software-task/status/:job_id every 20 seconds for up to 15 minutes. The status endpoint is public — no auth cookie required — because n8n needs to call it without a dashboard session. Separating submission from status also means we can check job state from anywhere: dashboard, n8n, CLI, tests.
The n8n ResumeUrl Port Problem
The approval gate uses an n8n Wait node, which pauses execution and gives back a resume URL. The dashboard renders an Approve button that hits that URL. When our test n8n instance generates the resume URL, it always uses port 5678 — the prod port — even though the test instance runs on 5679. This means every approval in test pointed at prod and hung forever.
The fix is a one-line replacement in the Build Approval Message Code node: rawBase.replace('localhost:5678', 'localhost:5679') before storing the URL in the approvals table. It is a workaround for a hardcoded default in n8n's execution context, but it is stable and easy to reason about.
The second approval bug was subtler: the Wait node resumes with query parameters in $json.query.approved, not in $json.approved. Our IF node was checking the wrong path, which meant every approval routed to "Task Rejected" regardless of what the human clicked.
GitHub Does Not Index PR Diffs Immediately
After Claude Code creates a PR, the Gemini reviewer needs to fetch the diff via the GitHub API. The first version called the diff endpoint once, immediately after PR creation, and consistently got back an empty file list. GitHub indexes diffs asynchronously — the PR exists before the diff data is available.
The fix: a retry loop in a single Fetch and Format Diff Code node. It attempts up to four times with 12-second gaps, checking whether files.length > 0 before proceeding. If all four attempts return empty, it throws and the pipeline fails loudly.
One n8n-specific constraint made this harder: Code nodes cannot return a raw array as the json value. Trying to return an array directly throws "A 'json' property isn't an object." The array must be wrapped: return [{json: {files: filesArray, total_files: N}}]. This is not documented prominently and the error message does not point to the cause.
State Does Not Cross the Wait Node
When a Wait node resumes, the execution context contains the webhook payload from the resume call — not the data from earlier in the workflow. Any variables set before the Wait are gone from $json on the other side.
We hit this when the Developer Step tried to read $json.task_description after the approval gate. The field was set before the Wait node and did not survive the resume. The fix is a Get Task Code node placed after the Wait that re-reads the task description from an earlier node using $('Normalize Input').first().json.task_description. This is the general pattern for any state you need after a Wait: pin it to a named node reference rather than relying on the flowing context.
Testing the Full Flow
We wrote two test files. The API test suite covers the submit and status endpoints directly: valid submission returns a job ID, blank task description returns 400, nonexistent job ID returns 404, the DB row appears after submit, and the status endpoint is reachable without auth. 28 assertions, all in under 10 seconds.
The E2E smoke test fires the hub webhook in a background thread, then polls the approvals table until a pending record appears — confirming the pipeline reached the Wait node and stored the correct resume URL. It does not approve or wait for Claude Code. A separate full mode (opt-in via environment variable) approves, waits for job completion, and verifies the final execution status in n8n.
Both test suites use a psql helper that needs the -F "\t" flag for tab-separated column output. Without it the default pipe separator breaks any assertion that splits on tab. Small detail, half an hour to diagnose.
What We Learned
The pattern that made this pipeline reliable is the same one that makes any long-running job reliable: submit fast, track state in a database, poll for completion. Running Claude Code synchronously inside an HTTP server is the wrong model. Detached workers with DB-backed status are the right one. Everything else in this pipeline — the retry loops, the Wait node state management, the diff timing workaround — is implementation detail. The async job queue is the load-bearing piece.
.png&w=384&q=75)