I have been building this site with Claude Code as my pair programmer. Claude writes most of the code. I review it, guide direction, and test. The productivity gain is real, but I noticed a gap early on: when you are the only human reviewer and your AI assistant wrote the code, you are reviewing from a position of trust. Claude explained what it did, the explanation sounds reasonable, the code looks right, and you move on. This is how bugs slip through.
OpenAI's Codex CLI gave me a second opinion, but the workflow for using it evolved through three distinct phases, each solving problems the previous one created. If you want a concrete example of Codex catching subtle bugs, see Theme Persistence and the Code Reviewer Who Never Sleeps.
Phase 1: Terminal tab gymnastics
The first version was embarrassingly manual. I would finish a chunk of work with Claude, open another terminal tab, run codex review --base main, wait for it to finish, copy the output, paste it somewhere I could read it, and then decide whether to act on the findings. If Codex caught something real, I would switch back to Claude's session, describe the issue, wait for the fix, and then run the review again.
It worked, but only barely. The friction was high enough that I skipped it more often than I should have. The value was obvious when I did use it—Codex caught real bugs, like unconditional preventDefault() breaking modifier-key clicks, or milliseconds labeled as seconds in Web Vitals displays—but the manual loop made it feel like extra work rather than part of the process.
Phase 2: GitHub Action that outstayed its welcome
The obvious fix was automation. I added a GitHub Action that ran Codex review on every PR and posted the findings as a comment. No more copy-paste, no more remembering to run the review, no more friction.
Except the cancel logic was sloppy.
The workflow triggered on pull_request events for opened and synchronize, which meant it ran on new PRs and on every push to an existing PR. That part was fine. The problem was what happened when a PR got merged or closed: any in-progress Codex review kept running. It would finish minutes later and post a comment to a PR that no longer mattered, wasting API calls and cluttering the PR history with stale reviews.
The fix required two things. First, adding closed to the event types so the workflow triggers when a PR closes. Second, using GitHub's concurrency feature with cancel-in-progress: true so that when the closed event fires, any running review for that PR gets cancelled immediately.
on:
pull_request:
branches: [main]
types: [opened, synchronize, closed]
concurrency:
group: codex-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
The workflow also needed a condition to skip the actual review job when the event is closed, since there is nothing to review at that point—the only purpose of the closed trigger is to cancel in-progress runs.
This version worked better, but it still had a timing problem. The review ran after the PR was opened, which meant I would push, open the PR, context-switch to something else, and then get a Codex comment twenty minutes later when I had already moved on. The feedback loop was too slow.
Phase 3: Pre-push hook with CI backstop
The current setup moves the first review earlier: a pre-push hook runs Codex before the code even reaches GitHub. If Codex finds issues that need changes, the push fails and I fix them immediately, while I still have context.
# .husky/pre-push
codex review --base main
if echo "$output" | grep -q "Verdict:.*Needs Changes"; then
echo "❌ Codex review found issues that need changes."
exit 1
fi
The hook can be skipped with SKIP_CODEX_REVIEW=1 git push for cases where I have already addressed the feedback or genuinely disagree with the review. It also skips automatically when pushing to main, since those pushes are typically merge commits that have already been reviewed.
The GitHub Action still runs as a backstop. It catches anything I might have skipped locally and provides a persistent record of the review in the PR. But most issues get caught and fixed before the PR exists, which means the CI review usually just confirms what the pre-push hook already approved.
What Codex actually catches
The value is not hypothetical. Here are real findings from the past few weeks:
Modifier key handling: Claude implemented smooth page transitions with the View Transitions API. The click handler called preventDefault() unconditionally, which broke Cmd+click to open in a new tab. Codex flagged it; we added checks for metaKey, ctrlKey, shiftKey, and altKey.
Wrong units: A GA4 export script labeled LCP and FCP values with 's' for seconds, but web-vitals reports them in milliseconds. Small error, but it would have made the analytics dashboard display nonsense.
Stale data: An export script updated latest.json with Web Vitals data when available but did not clear the field when there was no new data, leaving stale values from previous runs.
None of these would have crashed the site. They are subtle correctness issues that accumulate into a vaguely broken experience.
The meta question
Is it strange to have AI review AI code while a human supervises? Maybe. But the alternative is having only one perspective on AI-generated code—either trusting it completely or reviewing every line yourself.
I do not have the energy to scrutinize everything Claude writes. I also do not want to ship bugs that a five-minute automated review would have caught. Codex fills that gap. It is a reviewer that does not know what we intended and just asks: is the code actually correct?
The workflow has gone from manual and forgettable to automated and reliable. The meta layers are deep, but the shipped code is better for it.