The Testing Academy · Masterclass

Codex
for
QA.

A practical Codex masterclass for QA engineers and SDETs — learn AI test automation, agentic testing workflows, AGENTS.md, Skills, Subagents, Hooks, MCP, Playwright, model routing, Gemini CLI, CI, and portfolio deployment.

HostPramod Dutta

TrackCodex QA Factory

AudienceQA / SDET

ProjectAgent + Portfolio

The room you just walked into

QA is no longer the last step.
It is the agent's quality system.

Codex is OpenAI's coding agent for writing, reviewing, testing, and shipping code across the CLI, app, IDE extension, and cloud task surfaces.

For a tester, Codex is not a faster autocomplete. It reads the repo, changes files, runs the suite, drives browser tools, reviews the diff, opens PRs, and keeps a transcript of what happened. Your job becomes designing the quality loop.

Quality owner still accountable for intent, risk, and release readiness.

Default maximum open subagent threads when left unset.

32K

Default AGENTS.md discovery byte cap before you split guidance.

∞

Patience for links, a11y, smoke, visual, API, lint, and boring proof.

Where Codex sits in your stack.

Think of Codex as a harness-native pair tester. It has repo context, shell access, patch-based edits, web search, image inputs, browser/computer use through the app, MCP connectors, and review mode. Your advantage is turning that into a repeatable QA factory.

Step 1

Read

AGENTS.md, code, logs, traces, tickets, screenshots, docs.

▸

Step 2

Plan

Use /plan before editing risky code or tests.

▸

Step 3

Act

Patch files, run shell commands, call MCP tools.

▸

Step 4

Verify

Run impacted specs, review diffs, collect evidence.

▸

Step 5

Ship

PR, review, CI, deploy, monitor, document.

Four surfaces. One workflow.

local

Codex CLI

Terminal-first TUI. Best for repo edits, quick scripts, local tests, model switches, and repeatable QA prompts.

desktop

Codex app

Local workspaces, in-app browser, Chrome/computer use, images, automations, worktrees, and visual QA.

editor

IDE extension

Editor-aware agenting with open files and selections as context. Good for tight code-review loops.

cloud

Codex Cloud

Delegated tasks in managed environments. Good for isolated bug fixes, PR prep, and longer work while you continue locally.

Setup · macOS / Linux / Windows

Five lines. One terminal.

# 1. install Codex CLI
npm i -g @openai/codex

# 2. open your QA repo
cd ~/work/qa-portfolio

# 3. start Codex
codex

# 4. first run: sign in when prompted
# ChatGPT account or API key auth

# 5. scaffold repo guidance
> /init  → creates AGENTS.md

Run Codex from the directory you want it to understand. The first run asks you to authenticate. After that, start every serious repo by creating and editing AGENTS.md.

node + npmgitripgrepgh clipnpmplaywright

Windows · native PowerShell supported; WSL2 still useful for Linux-native stacks.

A session has four moving parts.

01Scope

Workspace

The folder or worktree Codex can inspect and edit. Keep tasks scoped to one feature or suite.

02Guidance

AGENTS.md

Stable repo rules, test commands, locator policy, CI rules, review expectations.

03Tools

Shell + patch + MCP

Codex edits via patches, runs commands, and reaches external tools through MCP or app connectors.

04Controls

Permissions

Auto, read-only, or full access. Pick the smallest mode that can finish the task.

Three ways to steer Codex.

prompt

Plain English

Describe the job and let Codex decide tools.

> run checkout smoke, fix only locator flakes,
  then show the diff and test output

/command

Session control

Use built-ins for model, plan, review, permissions.

> /plan
> /model
> /review

shell

Exact command

Ask Codex to run or run yourself in a terminal.

npx playwright test --grep @smoke
npm run lint
git diff --stat

The complete QA cheat-sheet

Every slash command a QA touches.

Control	Use it for
/permissions	Switch between Auto, Read Only, or tighter approval requirements.
/model	Choose model and reasoning effort for the current task.
/fast	Toggle Fast service tier when available.
/plan	Ask for a plan before implementing.
/goal	Set a persistent objective for long-running work.
/status	Check thread state, context usage, rate limits.
/compact	Summarize long context and keep going.
/resume	Continue a saved conversation.

Workflows	Use it for
/init	Create an AGENTS.md scaffold.
/review	Review current working tree, branch, or commit.
/diff	Inspect local edits before committing.
/mcp	Inspect configured MCP servers and tools.
/skills	Browse and explicitly invoke skills.
/hooks	Review and trust lifecycle hooks.
/agent	Switch to a spawned subagent thread.
/side	Ask a side question without polluting the main thread.

QA habit · type /plan before broad changes, /diff before claiming done, and /review before opening a PR.

Patch · Shell · Search · Browser.
Everything else is orchestration.

> find the flaky wait in tests/login.spec.ts,
  replace it with an assertion-based wait,
  run that spec 20 times, and report the pass rate.

// Codex will usually:
  rg(waitForTimeout|sleep|networkidle)
  apply_patch(tests/login.spec.ts)
  shell(npx playwright test tests/login.spec.ts --repeat-each=20)
  summarize(diff + exit code + failures)

patch

Controlled edits

Small, reviewable hunks. Best for test fixes and framework changes.

shell

Proof loop

Run tests, lint, typecheck, curl, git, trace viewers, report generators.

Current facts

Use live web search when docs, models, prices, or rules might have changed.

mcp

External systems

Browser, Jira, GitHub, Figma, docs, APIs, and team-specific tools.

Give Codex just enough autonomy.

default

Auto

Reads, edits, and runs commands inside the workspace. Still asks before network or outside-scope actions.

safe

Read Only

Best for audits, root-cause analysis, onboarding, and plan-first reviews.

danger

Full Access

Use only in trusted repos or disposable sandboxes. Powerful, expensive, and easy to regret.

The single file that changes everything

Teach Codex your QA rules once.

Codex reads AGENTS.md before doing work. It layers global guidance from ~/.codex, repo guidance from the Git root, and nested directory overrides down to your current folder.

# QA conventions — qa-portfolio

## Build and test
- Package manager: pnpm
- Unit: pnpm test
- E2E: npx playwright test
- Smoke: npx playwright test --grep @smoke

## Locators
- Prefer getByRole, getByLabel, getByTestId.
- No raw xpath in committed tests.
- No page.waitForTimeout.

## Done means
- Diff reviewed.
- Impacted test run pasted with exit code.
- Trace/screenshot attached when browser behavior changed.

Scope	File
Global	~/.codex/AGENTS.md or AGENTS.override.md
Repo root	./AGENTS.md
Nested folder	tests/e2e/AGENTS.md
Override	AGENTS.override.md wins in that directory

Run /init to scaffold it.

What goes inside AGENTS.md

Nine sections every QA repo needs.

## 1. Stack
Playwright, TypeScript, APIRequestContext, axe.
## 2. Commands
pnpm lint · pnpm test · npx playwright test.
## 3. Folder layout
tests/e2e · tests/api · tests/fixtures · tests/pom.
## 4. Locators
Role/label/testid first. XPath forbidden.
## 5. Waits
No hard sleeps. Use assertions and expect.poll.
## 6. Test data
No real PII. Use builders and env-based users.
## 7. Tags
@smoke @regression @a11y @visual @flaky.
## 8. Review
Prioritize bugs, regressions, missing tests.
## 9. Guardrails
Ask before deleting specs, changing CI, or bumping deps.

CommandsCodex can prove work instead of guessing how to run tests.

LocatorsThe fastest way to stop generated tests becoming flaky.

DataKeeps secrets and real users out of prompts and fixtures.

ReviewTurns Codex into a QA reviewer, not a style nit machine.

GuardrailsDefines where autonomy must stop and ask.

When Codex makes a wrong assumption, do not just correct the prompt. Update AGENTS.md so the next session starts smarter.

Context is a budget. Spend it like one.

stable

AGENTS.md

Rules that should be true next month: commands, conventions, architecture boundaries.

session

/compact

Summarize a long thread when it has useful decisions but too much transcript weight.

learned

/memories

Manage useful local context learned across work, where enabled by your setup.

Current Codex model map

Pick the model for the risk, not the ego.

Model	Best QA use	Command
gpt-5.5	Hard debugging, large refactors, research-heavy QA strategy, computer use.	`codex -m gpt-5.5`
gpt-5.4	Professional coding and test framework work with strong reasoning.	`codex -m gpt-5.4`
gpt-5.4-mini	Fast, lower-cost edits, subagents, simple spec generation.	`codex -m gpt-5.4-mini`
gpt-5.3-codex	Dedicated agentic coding and local code review workflows.	`codex -m gpt-5.3-codex`
gpt-5.3-codex-spark	Near-instant text-only coding iteration where available.	`codex -m gpt-5.3-codex-spark`

Inside the CLI, use /model to switch mid-session and set reasoning effort. For simple subagents, mini is often enough. For migration or risk-heavy changes, start frontier.

Gemini, OpenAI-compatible endpoints, and local models

Use Gemini as a second expert, not a random swap.

# install
npm install -g @google/gemini-cli

# run in the same repo
gemini

# pick a specific Gemini model
gemini -m gemini-2.5-flash

# non-interactive review
gemini -p "Review tests/e2e for missing assertions" \
  --output-format json

The cleanest Gemini workflow is side-by-side: let Codex edit and verify in your repo, then ask Gemini CLI for an independent review, long-context explanation, search-grounded research, or alternative test strategy.

Lane	Use it when
Codex	You want patch-based repo edits, code review, worktrees, goals, skills, hooks.
Gemini CLI	You want Google Search grounding, Gemini model behavior, another read on requirements or test gaps.
Gateway	Your org exposes Gemini or other models through a Responses-compatible endpoint for Codex.
Local	You want Ollama or LM Studio for private, lower-capability experiments.

# ~/.codex/config.toml
model_provider = "qa-gateway"
model = "gemini/gemini-2.5-pro"

[model_providers.qa-gateway]
name = "QA model gateway"
base_url = "https://gateway.example.com/v1"
env_key = "QA_GATEWAY_API_KEY"
wire_api = "responses"
supports_websockets = false

Important · Codex custom providers currently use the Responses protocol. Google's Gemini OpenAI-compatibility examples use Chat Completions, so direct Gemini endpoint routing may not be enough unless your gateway translates to Responses. For practical QA teams, native Gemini CLI plus Codex is the reliable path.

A tester's routing table.

deep

Risky framework work

Use gpt-5.5 or gpt-5.4 high/xhigh. Require plan, diff, targeted tests, and review.

fast

Spec edits

Use gpt-5.4-mini or current recommended mini. Run the exact spec immediately.

parallel

Subagents

Use mini for explorers and one frontier reviewer for final synthesis.

second view

Gemini CLI

Ask for independent risk review, missing scenarios, and edge-case brainstorming.

private

Local model

Use Ollama/LM Studio for docs summarization, not production code edits unless proven.

Bounded automation

Use narrow prompts, max turns, focused diff, and explicit output caps.

Spawn specialists, not chaos.

Codex can spawn specialized agents in parallel when you explicitly ask. Built-ins include default, worker, and explorer. Custom agents live as TOML files under ~/.codex/agents/ or .codex/agents/.

explorer

Codebase map

Find test owners, fixtures, helper APIs, flaky waits, and routes without editing.

worker

Implementation

Make a bounded change after the plan is approved.

reviewer

QA diff review

Read the final diff for flake risk, missing assertions, bad test data, and CI gaps.

> Review this branch vs main. Spawn one agent per topic:
  1. security risk
  2. test flakiness
  3. missing assertions
  4. API contract risk
  5. maintainability
Wait for all agents, then summarize the top 8 findings.

Skills are playbooks Codex loads on demand.

A skill is a directory with a required SKILL.md plus optional scripts/, references/, assets/, and helper files. Codex starts with the skill name and description, then reads the full instructions only when the task matches.

---
name: flake-hunter
description: Use when a Playwright spec fails intermittently or contains waitForTimeout, sleep, networkidle, or brittle locator patterns.
---

# Flake Hunter
1. Read the failing spec and related fixture.
2. Search for hard waits and brittle selectors.
3. Replace with role locators and assertion waits.
4. Run the spec with --repeat-each=20.
5. Report pass rate, changed lines, and remaining risk.

Where Codex finds skills

Scope	Location
Repo	$CWD/.agents/skills or repo-root/.agents/skills
User	$HOME/.agents/skills
Admin	/etc/codex/skills
System	Bundled skills such as skill-creator

Invoke explicitly with $skill-name or via /skills.

Build your own Codex for QA

Package your testing brain as skills.

skill

locator-auditor

Scans specs for XPath, nth-child, CSS chains, and missing accessible names.

skill

api-contract-maker

Turns curl/OpenAPI/Postman exports into Playwright APIRequestContext suites.

skill

bug-from-trace

Reads trace, screenshot, console, and network logs and drafts a Jira-ready bug.

> Use $skill-creator to create a repo-scoped skill named locator-auditor.
It should trigger when tests use XPath, CSS chains, nth-child, test-only waits,
or missing assertions. It should read tests/, output a risk table, and only edit
when I explicitly say "fix them".

When a skill is not enough, use a plugin.

Plugins package skills, MCP servers, and apps together. For a QA organization, a plugin can ship the company browser tools, Jira connector, test-data service, and house skills as one installable bundle.

skills

Team playbooks

Common workflows: smoke triage, accessibility audit, release-readiness report.

mcp

Tool servers

Playwright, Jira, test data, internal QA dashboards, contract registry.

apps

Local UI

A mini dashboard for traces, screenshots, and run summaries inside Codex.

Hooks fire around tool calls.

Hooks let you run scripts on Codex lifecycle events: prompt submit, pre-tool, permission request, post-tool, compaction, subagent start/stop, session start, and stop. Use them for formatting, test targeting, audit logs, and safety gates.

"PostToolUse": [{
  "matcher": "apply_patch",
  "hooks": [{
    "type": "command",
    "command": "npm run lint -- --quiet"
  }]
}],
"PreToolUse": [{
  "matcher": "shell",
  "hooks": [{
    "type": "command",
    "command": "node scripts/block-main-edits.js"
  }]
}]

Hook	QA use
UserPromptSubmit	Log prompts or block secrets.
PreToolUse	Prevent destructive shell commands.
PermissionRequest	Auto-deny unsafe escalations.
PostToolUse	Format or run impacted tests.
Stop	Emit a run summary.

MCP turns Codex into a browser, Jira, GitHub, docs, and your internal API.

Model Context Protocol exposes external tools as structured actions. For QA, the high-value servers are Playwright/browser, Jira/Confluence, GitHub, design systems, test data, and custom product APIs.

browser

Playwright MCP

Navigate, click by role, inspect accessibility tree, capture screenshots and network logs.

tickets

Atlassian MCP

Read acceptance criteria, write test plans, file bugs with traces.

repo

GitHub

PRs, checks, issues, release notes, review comments, and branch status.

docs

Notion / Confluence

Pull specs and publish execution reports.

data

Test data MCP

Create safe seeded users, orders, payments, feature flags.

custom

Your own MCP

Wrap internal CLIs and APIs so Codex can test like your team tests.

codex mcp add playwright -- npx @playwright/mcp@latest

A QA agent needs eyes and hands.

Use the in-app browser for local apps, screenshots, accessibility snapshots, and visual verification. Use Chrome automation when cookies, extensions, or logged-in remote sessions matter.

Open

Localhost or remote target.

▸

Snapshot

Capture accessible structure.

▸

Act

Click, type, upload, resize.

▸

Assert

Check text, pixels, console.

▸

Codify

Turn findings into tests.

Demo · authoring a spec without typing locators

"Open saucedemo.com, log in, add an item, screenshot the cart."

codex · qa-portfolio

› drive saucedemo, login as standard_user, add backpack to cart, screenshot, then generate a Playwright spec from the path

▸ browser_navigate('https://saucedemo.com')
▸ browser_snapshot() · captured a11y tree
▸ browser_click(role=button, name='Login')
▸ browser_click(name='Add to cart')
▸ browser_take_screenshot('cart.png')
▸ apply_patch(tests/cart.spec.ts) ✓
▸ npx playwright test tests/cart.spec.ts · green

import { test, expect } from '@playwright/test';

test('adds backpack to cart', async ({ page }) => {
  await page.goto('https://www.saucedemo.com');
  await page.getByRole('textbox', { name: /user/i }).fill('standard_user');
  await page.getByRole('textbox', { name: /pass/i }).fill('secret_sauce');
  await page.getByRole('button', { name: 'Login' }).click();
  await page.getByRole('button', { name: /add to cart/i }).first().click();
  await expect(page.getByText('Sauce Labs Backpack')).toBeVisible();
});

API tests from a single curl.

Paste a curl, OpenAPI URL, or Postman export. Codex can infer happy path, negative path, schema validation, auth variants, and fixture structure.

> generate Playwright API tests for this endpoint.
Include positive, invalid email, missing auth, schema,
and one contract drift check. Use zod for runtime validation.

curl -X POST https://api.demo.dev/v1/users \
  -H 'Authorization: Bearer $T' \
  -H 'Content-Type: application/json' \
  -d '{"email":"[email protected]","plan":"pro"}'

tests/api/users.spec.ts
fixtures/apiClient.ts
tests/contracts/user.schema.ts

Coverage:
- 201 create user
- 400 invalid payload
- 401 missing token
- schema validation
- idempotency or duplicate email behavior

Tests from requirements

A Jira ticket in. A test plan out.

> fetch QA-482 from Jira, read acceptance criteria,
produce: 1) Gherkin scenarios, 2) Playwright skeleton,
3) coverage matrix mapping each AC to a test id,
4) risk list for untestable or ambiguous criteria.

AC-1 valid promo       → TC-482-001 @smoke
AC-2 expired promo     → TC-482-002 @negative
AC-3 country tax       → TC-482-003 @regression
AC-4 rounding rule     → TC-482-004 @edge
Ambiguous: tax source of truth missing.

From a screenshot to a filed bug.

> [screenshot attached]
User reports the price chip overflows on mobile.
Reproduce at 390x844, capture screenshot + trace,
find likely component, and draft a Jira-ready bug
with steps, expected, actual, severity, and evidence.

codex · bug-bash

▸ identified route: /pricing
▸ viewport: 390x844
▸ component: PricingCard / price-chip
▸ evidence: screenshot + trace.zip
▸ bug title: Pricing chip overflow on mobile
done

QA reviews code too. Now they have leverage.

Prompt	Outcome
/review	Severity-tagged findings on current diff.
review as QA	Flaky waits, missing assertions, untested branches.
review as security	Auth, SSRF, injection, secret leakage.
spawn reviewers	Parallel specialist review, final synthesis.

tests/login.spec.ts:14 · P1
page.waitForTimeout(2000) hides race.
Fix: wait for dashboard heading and toast.

src/auth/middleware.ts:42 · P0
Token branch skips expiry validation.
Fix: assert exp before session creation.

playwright.config.ts:8 · P2
retries: 3 masks flakes.
Fix: retry once, quarantine with owner.

Run Codex in your pipeline.

Use headless Codex only for narrow, bounded CI jobs: summarize failing tests, draft PR review comments, triage smoke failures, or create a follow-up issue. Keep prompts small and permissions tight.

name: qa-bot
on: { pull_request: { types: [opened, synchronize] } }
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx playwright install --with-deps
      - name: Codex QA review
        env: { OPENAI_API_KEY: secrets.OPENAI_API_KEY }
        run: |
          codex exec --model gpt-5.4-mini \
            "Review this PR for QA risk only. Focus on flaky waits,
             hard-coded data, missing assertions, and broken smoke coverage.
             Output max 8 bullets with file:line when possible."

Plan first. Then let it loose.

plan

Read-only thinking

Use /plan for unknown codebases, migrations, auth, payments, CI, and shared test fixtures.

worktree

Isolated sandbox

Run the risky work in a branch or worktree so your main checkout and dev server stay stable.

Together these are the two controls that let you delegate real work without losing engineering judgment.

Things that will save you a workday.

/planForce strategy before edits.

/diffInspect exactly what changed.

/reviewAsk Codex to critique its own patch.

/compactPreserve decisions, free context.

@filePoint Codex at the exact spec, fixture, or trace.

$skillExplicitly invoke a QA skill.

/sideAsk a side question without polluting the main task.

rg firstSearch before editing. Tests fail slower than search.

proofAsk for command, exit code, and report path.

branchNever let autonomous work start on protected main.

A QA day · before vs after.

Before

09:00Four flaky tests overnight.

10:00Manually translate Jira AC into tests.

12:00Still debugging the locator.

15:00Write bug report and attach evidence by hand.

18:00Backlog grew.

After

09:00Codex summarizes overnight failures.

09:30Gemini gives second-opinion scenario gaps.

11:00Codex patches the flake and runs 20x proof.

14:00Playwright MCP exploration produces trace evidence.

17:00PR reviewed, CI gated, release notes drafted.

When one repo has 12 apps and 8 suites

Codex in a monorepo.

Problem	Pattern
Too much context	Run Codex from the package folder and add only needed dirs.
Different test commands	Put package-specific AGENTS.md files near each suite.
Shared fixtures	Give Codex ownership rules before editing shared files.
Slow CI	Teach impacted-test selection and smoke tags.
Cross-app flows	Use a project-level plan before patching any suite.

Do not rewrite by hand

Selenium · Cypress · TestCafe → Playwright.

Let Codex inspect the old suite, build a migration matrix, convert one vertical slice, run it, then scale. Ask Gemini to independently review missing behavior before deleting the old tests.

> Migrate cypress/e2e/checkout to Playwright under tests/e2e/checkout.
Rules: prefer getByRole/getByLabel/getByTestId, no waitForTimeout,
cy.intercept becomes page.route, fixtures become typed builders.
Convert one spec first, run it, show diff and pass/fail before continuing.

Numbers that drive QA decisions

Flake rate · p95 · MTTR — let Codex do the math.

flake

Flake rate

Failures that pass on retry divided by total executions.

speed

p95 duration

Protects CI from slow suite creep.

repair

MTTR

Time from red build to green fix.

risk

Coverage by AC

Links acceptance criteria to tests and release confidence.

Hands-on · run these now

Five drills. Do them in order.

015 min

First contact

Run codex, then /init. Edit AGENTS.md with real test commands.

0210 min

Review first

Ask Codex to review a recent PR for QA risk only.

0315 min

Tame a flake

Remove one waitForTimeout and prove stability with repeat-each.

0415 min

Gemini second pass

Use Gemini CLI to find missing scenarios in the same change.

0530 min

Ship a skill

Create locator-auditor under .agents/skills and run it on tests/.

06stretch

Codex in CI

Run a bounded QA review job on a PR and post an artifact.

Capstone · build your own Codex for QA

Skill · model routing · MCP · site · live URL.

By the end, you have a repo-scoped QA agent: AGENTS.md, three skills, Playwright MCP, Gemini second-review lane, GitHub Actions, and a portfolio page showing how the system works.

AGENTS.mdskillsMCPGemini CLIPlaywrightCI

Your QA system.
Codified once.

Codex handles the toil; you own risk, judgment, release confidence.

CodexGeminiPlaywrightMCP

locator-auditorFinds brittle selectors.

flake-hunterRemoves sleeps and proves stability.

bug-from-traceTurns evidence into Jira.

release-gateSummarizes readiness.

Step 1 · foundation

Create AGENTS.md and the first skill.

> Build the QA agent foundation for this repo.
Create AGENTS.md with commands, locator policy, wait policy, data policy,
review rules, and stop conditions. Then create .agents/skills/locator-auditor/SKILL.md.
Do not edit product code. Run the skill on tests/ and report the top 10 risks.

Step 2 · model routing

Codex edits. Gemini critiques.

> Add a docs/model-routing.md file for our QA team.
Include when to use gpt-5.5, gpt-5.4, gpt-5.4-mini, Gemini CLI,
and local models. Add examples for: flaky spec, API contract suite,
release-readiness review, and migration planning. Keep it practical.

Step 3 · Playwright + CI

Cover every route. Gate every merge.

> Add Playwright smoke, a11y, visual, link, and SEO suites.
Run mobile 390x844 and desktop 1440x900. Add CI with lint, typecheck,
e2e, and report upload. If anything fails locally, fix it before reporting done.

Step 4 · deploy

From localhost to thetestingacademy.com.

> Deploy ./qa-portfolio to Vercel.
Run the Playwright suite against the prod URL.
Print preview URL, prod URL, test report path, and next manual checks.

> Publish this Codex masterclass page under
app.thetestingacademy.com/masterclass/codex.html.
Verify with curl and a browser snapshot after deploy.

Don't do these. Ever.

Skipping AGENTS.md

Without it Codex guesses your conventions and you fight it every turn.

Trusting green without proof

Ask for command, exit code, and report path.

Putting secrets in prompts

Use env vars, vaults, and CI secrets.

Direct Gemini config without protocol check

Use native Gemini CLI unless your gateway supports Codex's required API protocol.

Auto-merging agent PRs

The agent writes. You review.

Running full access on protected main

Use worktrees, feature branches, and scoped permissions.

The new QA toolbelt.

agent

Codex

The editor, reviewer, tester, and orchestrator.

model

GPT-5 family

Frontier reasoning down to fast mini work.

second

Gemini CLI

Independent review and search-grounded planning.

runner

Playwright

Browser, API, trace, visual, a11y.

adapter

MCP

Connects agent to tools.

memory

AGENTS.md

Team rules and commands.

playbook

Skills

Reusable QA workflows.

judgment

You

Risk, release confidence, product sense.

Where to go next.

docs

OpenAI Codex docs

developers.openai.com/codex

cli

Codex CLI

install, auth, commands, MCP, config

models

Codex models

current OpenAI model guidance

gemini

Gemini CLI

official Gemini CLI repository

mcp

Model Context Protocol

tool protocol and server registry

play

Playwright

runner, locators, traces, fixtures

Stop
typing tests.
Start
building
quality systems.

SpeakerPramod Dutta

BrandThe Testing Academy

Mail[email protected]

StatusCodex QA ready.

CodexforQA.

The room you just walked into

QA is no longer the last step.It is the agent's quality system.

Where Codex sits in your stack.

Read

Plan

Act

Verify

Ship

Four surfaces. One workflow.

Codex CLI

Codex app

IDE extension

Codex Cloud

Setup · macOS / Linux / Windows

Five lines. One terminal.

A session has four moving parts.

Workspace

AGENTS.md

Shell + patch + MCP

Permissions

Three ways to steer Codex.

Plain English

Session control

Exact command

The complete QA cheat-sheet

Every slash command a QA touches.

Patch · Shell · Search · Browser.Everything else is orchestration.

Controlled edits

Proof loop

Current facts

External systems

Give Codex just enough autonomy.

Auto

Read Only

Full Access

The single file that changes everything

Teach Codex your QA rules once.

What goes inside AGENTS.md

Nine sections every QA repo needs.

Context is a budget. Spend it like one.

AGENTS.md

/compact

/memories

Current Codex model map

Pick the model for the risk, not the ego.

Gemini, OpenAI-compatible endpoints, and local models

Use Gemini as a second expert, not a random swap.

A tester's routing table.

Risky framework work

Spec edits

Subagents

Gemini CLI

Local model

Bounded automation

Spawn specialists, not chaos.

Codebase map

Implementation

QA diff review

Skills are playbooks Codex loads on demand.

Where Codex finds skills

Build your own Codex for QA

Package your testing brain as skills.

locator-auditor

api-contract-maker

bug-from-trace

When a skill is not enough, use a plugin.

Team playbooks

Tool servers

Local UI

Hooks fire around tool calls.

MCP turns Codex into a browser, Jira, GitHub, docs, and your internal API.

Playwright MCP

Atlassian MCP

GitHub

Notion / Confluence

Test data MCP

Your own MCP

A QA agent needs eyes and hands.

Open

Codex
for
QA.

QA is no longer the last step.
It is the agent's quality system.

Patch · Shell · Search · Browser.
Everything else is orchestration.

Your QA system.
Codified once.