Concept reference . The npx playwright command surface
Playwright CLI - the day-to-day commands
The npx playwright CLI used to live as a separate playwright-cli package
but is now bundled into the regular playwright npm package. It is the surface you'll
touch every day - install browsers, generate code, run tests, replay traces, open reports, ship to
CI. This page walks ten clusters of commands you should have in your hands, with TTA exercises for
each. Every snippet targets our hosted practice apps.
DraftDraft - private preview - not yet in the main sidebar. This page accompanies the
Lecture: Playwright CLI session. All commands are from the public Playwright CLI surface; no
project-specific code is copied from any upstream repository.
1First-day CLI - init, install, test, show-report
Four commands that get a brand-new project from zero to a green test against TTACart. Memorise these; you'll type them every time you bootstrap a new repo.
$npx playwright install--with-depsbrowsers + OS deps
$npx playwright testrun every spec, headless
$npx playwright show-reportopen the HTML report from the last run
npm init playwright@latest walks you through a wizard - TS vs JS, GitHub Actions
yes/no, browsers chromium/firefox/webkit. The output is a complete repo: tests/,
playwright.config.ts, tsconfig.json, and a .gitignore that
excludes node_modules and test-results. The --with-deps
flag on install installs OS-level libraries (helpful on Linux CI runners that don't
ship X11 dependencies by default).
Exercises
Bootstrap. Run the wizard in an empty directory, accept the defaults, then point the example spec at https://app.thetestingacademy.com/playwright/ttacart/ and get it green.
Inspect dependencies. Run npx playwright install --dry-run --with-deps and list everything it would touch.
Browser binary location. Find where Playwright cached the browser binaries on your OS (Linux / macOS / Windows). Why does this matter for CI cache keys?
show-report freshness. Run two suites, run show-report twice. Which run is being displayed? How does Playwright pick?
2Codegen - record selectors + actions
A browser opens, you click around, Playwright writes the spec for you. Best for two things: learning the locator API (you see what selector Playwright would pick), and bootstrapping a draft spec against a brand-new page.
$npx playwright codegen --save-storage=auth.jsonhttps://...capture cookies + localStorage
Codegen picks locators in a defined order: getByRole first, then getByLabel,
then getByText, with CSS as a last resort. Treat the generated file as a draft
- rename it, extract page objects, dedupe waits, fold into a real spec. It is not production code,
it is a transcript of what you did.
TTA tip. Run codegen against TTACart at our public URL, not localhost.
The locators codegen picks against our real DOM will translate cleanly to your test - the same DOM
CI sees.
Exercises
Record a login. Run npx playwright codegen https://app.thetestingacademy.com/playwright/ttacart/ to capture the TTACart login flow. Refactor the recording into a LoginPage POM.
Mobile codegen. Re-record the same flow with --device='iPhone 14'. What changed in the locators?
Auth storage. Use --save-storage=auth.json to capture a logged-in session, then write a fixture that uses storageState: 'auth.json'.
Target language switch. Record one TTACart flow in Python, port it manually back to TypeScript. Note where Playwright's API differs across languages.
3Trace viewer - npx playwright show-trace
A standalone DevTools-like viewer for a Playwright trace.zip. Time-travel through every action, see the DOM at each step, replay network calls, watch the console. The single most useful tool for debugging flakes.
$npx playwright show-trace test-results/.../trace.zipopen viewer in a new window
$npx playwright show-trace --port=8765trace.zippin the port
$npx playwright show-tracedrop a zip onto the page
Configure trace capture in playwright.config.ts via
use: { trace: 'on-first-retry' }. on-first-retry is the production default
(capture trace only when a retry runs) - it keeps your test-results folder small but
still ships you the data you need when something goes wrong. The viewer also runs at
trace.playwright.dev if you want to share a trace with a teammate over Slack.
Exercises
Capture + replay. Force a TTACart spec to fail (rename a button). Run with trace: 'on', open the trace, walk through the action timeline.
Network panel. Find an XHR in the trace where the response was 4xx. Annotate the panel.
DOM diff. Step from one action to the next and use the DOM snapshot panel to spot what changed. Where does the SKU price update?
Trace + retain-on-failure. Configure retain-on-failure, run a green suite, run a red suite. Compare the size of test-results after each.
4Headed + debug mode
Run with a visible browser, step through with the Playwright Inspector, drop a page.pause() to break right before the spot you care about. The fastest local debugging loop in browser automation.
--headed--debugpage.pause()PWDEBUG=1--slowmo=500--workers=1 with debugstep over / step in
$npx playwright test --headed--workers=1visible browser, one at a time
$npx playwright test --debugopen Inspector, step through
$PWDEBUG=1 npx playwright testenv-var equivalent of --debug
$npx playwright test --headed--slowmo=300visual replay for screencasts
--debug opens the Playwright Inspector window alongside the test - you see the spec
on the left, the browser on the right, and a step-over button at the top. Drop
await page.pause() anywhere in your spec to set a breakpoint without restarting the
run. --slowmo=500 inserts a half-second delay between every action - useful for
recording screencasts, not for actual debugging (it just hides race conditions).
Exercises
Inspector tour. Run any TTACart spec with --debug and use the Pick Locator button to inspect the cart's add-to-cart button. Compare to what you'd write by hand.
Pause mid-flight. Drop page.pause() just before the assertion, inspect the DOM, then resume.
slowmo vs awaits. Run a spec with --slowmo=1000 and notice that explicit waits in the spec disappear into the noise. Explain why slowmo masks races.
Single worker rule. Try --debug with --workers=4. What does Playwright do? Why?
5Sharding + parallelism
Two knobs control how much of the suite runs at once. --workers=N sets in-process parallelism. --shard=k/n sets cross-process / cross-CI-job parallelism.
--workers is local parallelism inside one CLI invocation. Use it on your laptop, use
it on a single CI runner. --shard=1/4 is for splitting the suite across four
separate CI jobs - each job runs a quarter of the specs in parallel. You collect the four
blob reports and run merge-reports to get a single combined HTML output.
TTA framework note. Our published
Advance Playwright framework uses 4 shards by default
in the GitHub Actions workflow. See its CI section for a concrete YAML.
Exercises
Worker scale test. Run the TTACart suite with workers 1, 2, 4, 8 and plot total runtime. Where does it stop scaling?
Shard split. Split your suite into 4 shards locally (in 4 terminals). Note that the spec assignment is deterministic.
Merge reports. Produce 4 blob reports from 4 shard runs and merge them. Open the combined HTML output and confirm all specs are present.
Worker leak. Add a spec that opens 10MB of memory and doesn't clean it up. Run with --workers=8. Watch RAM in top. What do you see?
6Filters - --grep, --project, --reporter
Run only the specs you care about. --grep filters by name, --project filters by browser / device, --reporter picks the output format.
$npx playwright test --grep@smokeonly specs tagged @smoke
$npx playwright test --grep-invert@flakyskip flaky specs
$npx playwright test --project=chromiumone browser only
$npx playwright test --reporter=html,listhtml + live terminal
$npx playwright test --listprint what would run, run nothing
Tag specs with strings in the title - test('add to cart @smoke', ...) - then
--grep '@smoke' selects them. --grep-invert is its mirror - useful for
"run everything except known-flakes on a hotfix branch". --project matches the
name field from playwright.config.ts's projects array, so
you can ship a green chromium pipeline while iterating on a flaky firefox case.
Exercises
Smoke set. Tag 3 TTACart specs as @smoke. Wire a CI job that only runs --grep @smoke on PRs.
Invert grep. Tag 2 specs as @flaky. Run --grep-invert @flaky on main, the full suite on a nightly cron.
Project matrix. Define 3 projects (chromium, firefox, webkit) and run only the project that's currently failing.
Reporter combo. Try --reporter=html,json,line. What gets written where?
7UI mode - npx playwright test --ui
A long-running watcher that re-runs specs on file change, lets you time-travel through actions, edit + re-run inline, and pin watch on a single test. The replacement for "save, switch terminal, type, hit up-arrow".
--uiwatch modetime-travel debuggingpin testsDOM snapshots in panellive locator pickertag-by-tag filter
$npx playwright test --uiopen UI runner
$npx playwright test --ui-host=0.0.0.0--ui-port=8081expose UI on the LAN
$npx playwright test --ui--project=chromiumUI scoped to one project
UI mode is the right default for local dev. You launch it once at the start of your morning,
change a spec, hit Cmd-S, and Playwright re-runs only the affected test. Failed runs show the
same trace viewer panels you'd get from show-trace - same time-travel, same DOM
snapshots, same network panel - inline. Use the Pick Locator button to grab a selector from any
DOM snapshot frame.
Exercises
Watch loop. Open UI mode, edit one assertion in a TTACart spec, watch it re-run live.
Pin a test. Pin one slow spec, edit only it, ignore the rest of the suite.
Locator picker. Use the inline Pick Locator button on a TTACart cart row to grab the best locator. Paste it back into your spec.
Tag filter. Filter to @smoke-tagged specs in the UI sidebar. Verify only those re-run on file change.
Remote UI. Run UI mode in Docker with --ui-host=0.0.0.0 and access from your host. When would you ever do this?
8Updating snapshots - visual regression
Playwright captures screenshots and DOM snapshots; expect(page).toHaveScreenshot() compares against a baseline. When the UI legitimately changes, you regenerate the baselines with --update-snapshots.
$npx playwright test --update-snapshotsregenerate all baselines
$npx playwright test --update-snapshots--grep@visualscoped regen
$npx playwright test -ushort form
Visual regression is the most-mistaken use case for Playwright. Two rules: (1) update snapshots
only when the UI change is intentional - and review the resulting diff in the PR like
any other code change. (2) Cap maxDiffPixels so that a 1-pixel anti-aliasing
difference doesn't fail your CI. Anti-aliasing differs between OS versions, so visual specs that
pass on a dev Mac may fail on a Linux CI runner.
Exercises
Capture a baseline. Add a expect(page).toHaveScreenshot() on the TTACart cart drawer. Run once to generate, commit, run again to verify.
Force a diff. Tweak the cart row colour in DevTools, run the spec, see the diff in the report.
Regenerate scoped. Use --update-snapshots --grep @visual to regen only a subset. Verify other baselines are untouched.
Cross-OS diff. Capture baselines on macOS, run the spec on Linux (via Docker). Where do you need to bump maxDiffPixelRatio?
9Reports - html, json, allure-playwright
Playwright ships HTML, JSON, JUnit, line, list, dot, and blob reporters out of the box. Plus community add-ons (allure-playwright, ortoni-report). Pick the right reporter for the right consumer.
$npx playwright test --reporter=htmlHTML report at playwright-report/
$npx playwright test --reporter=jsonmachine-readable run summary
$npx playwright test --reporter=junitfor Jenkins / GitHub Actions
$npx playwright test --reporter=blobshard input for merge-reports
$npx playwright merge-reports --reporter=html./blob-reportsmerge shards into HTML
html for humans, json for dashboards, junit for CI surface (GitHub Actions
picks junit up automatically and renders it in the Checks tab), blob for sharded runs that
need merging. list / line / dot are terminal-only and useful in CI logs. Custom reporters
live in our own framework - see the
Advance framework page for the
CustomTTAReporter.ts walkthrough.
Exercises
Two reporters at once. Run with --reporter=html,junit and inspect both outputs.
JSON consumer. Pipe --reporter=json into a small script that prints "N passed, M failed" - useful for Slack notifications.
Allure setup. Wire allure-playwright, run a TTACart suite, generate the Allure HTML report.
Custom reporter outline. Sketch a 30-line reporter that prints only failure URLs from a TTACart run. Look at the Reporter interface in the Playwright types.
Running the same CLI inside CI as you do locally. Three patterns: GitHub Actions matrix + sharding, Jenkins pipeline + parallel stages, and Docker image for any-runner reproducibility.
$docker run --rm -it mcr.microsoft.com/playwright:v1.45.0-jammy bashofficial image with browsers preinstalled
$CI=1 npx playwright teststricter defaults - no retries-on-pass etc.
$npx playwright test --shard=${{matrix.shard}}/4inside a GitHub Actions matrix
The official mcr.microsoft.com/playwright image ships every browser preinstalled and
pinned to a Playwright version. Match the image tag to your @playwright/test
version - mismatches cause "executable doesn't exist at ..." errors at runtime. In GitHub Actions,
a 4-shard matrix runs 4 parallel jobs that each upload a blob reporter artefact, then
a final merge-reports job stitches them together.
TTA framework. Our
Advance Playwright framework ships a full GitHub
Actions workflow with 4-shard parallelism, browser caching, and merged HTML reports as artefacts.
Exercises
4-shard pipeline. Write a GitHub Actions workflow that runs your TTACart suite as 4 shards in parallel, merges the blob reports, and uploads the HTML as an artefact.
Browser cache key. Cache ~/.cache/ms-playwright across runs keyed on Playwright version. Measure the time saved.
Docker run. Run the entire suite inside the official Docker image on your laptop. Time the first run vs. the second.
Jenkins parallel. Sketch a Jenkinsfile with 4 parallel stages, each running one shard.
CI=1 effects. Run locally with CI=1 npx playwright test and read the docs for what changes. Why is retries different on CI?
Two mini-projects (scope only)
The lecture batch ships two CLI-focused projects. The directions are summarised here; the actual code lives in your own repo - this is the "what should I build" spec, not a copy of the upstream project.
Project 1
VWO-style login flow
Use codegen to record a login attempt against a public VWO sample form, refactor into a LoginPage POM with explicit waits, add a @smoke tag, run under --ui mode, capture a trace on failure, ship a green run inside the official Playwright Docker image.
Project 2
TTA Bank end-to-end flow
Build a bank-style transfer journey on top of our TTACart sandbox - login, transfer, statement assertion. Add 4 specs, shard across --shard=1/4 through --shard=4/4, merge the blob reports, generate an HTML report, wire a GitHub Actions matrix that runs the same shards on every push.
Diagrams - command tree and CI pipeline
Two mermaid diagrams. First: the npx playwright command tree, grouped by use case. Second: a sharded CI pipeline that uses 6 of the commands above.
npx playwright - command tree
Every CLI verb you'll touch, grouped by phase. Green = run-time, amber = author-time, violet = debug-time.
flowchart TB
ROOT[npx playwright] --> SETUP[setup]
ROOT --> AUTHOR[author]
ROOT --> RUN[run]
ROOT --> DEBUG[debug]
SETUP --> S1[init]
SETUP --> S2[install]
SETUP --> S3[install --with-deps]
AUTHOR --> A1[codegen url]
AUTHOR --> A2[codegen --device]
AUTHOR --> A3[codegen --save-storage]
RUN --> R1[test]
RUN --> R2[test --workers N]
RUN --> R3[test --shard k/n]
RUN --> R4[test --grep tag]
RUN --> R5[test --project name]
RUN --> R6[test --reporter list]
RUN --> R7[merge-reports]
DEBUG --> D1[test --headed]
DEBUG --> D2[test --debug]
DEBUG --> D3[test --ui]
DEBUG --> D4[show-trace zip]
DEBUG --> D5[show-report]
classDef setup fill:#fef3c7,stroke:#f59e0b,color:#111
classDef author fill:#fde68a,stroke:#d97706,color:#111
classDef run fill:#d1fae5,stroke:#16a34a,color:#111
classDef debug fill:#ede9fe,stroke:#8b5cf6,color:#111
class S1,S2,S3 setup
class A1,A2,A3 author
class R1,R2,R3,R4,R5,R6,R7 run
class D1,D2,D3,D4,D5 debug
Sharded CI pipeline
One push triggers four parallel shard jobs; each writes a blob report; a final job merges + publishes an HTML artefact. This is the shape our V1 framework ships.
flowchart LR
PUSH[git push] --> CI[GitHub Actions]
CI --> M[matrix.shard=1..4]
M --> J1[Job 1 test --shard=1/4 --reporter=blob]
M --> J2[Job 2 test --shard=2/4 --reporter=blob]
M --> J3[Job 3 test --shard=3/4 --reporter=blob]
M --> J4[Job 4 test --shard=4/4 --reporter=blob]
J1 --> UP[upload artefact]
J2 --> UP
J3 --> UP
J4 --> UP
UP --> MERGE[merge-reports --reporter=html]
MERGE --> ART[final HTML artefact]
ART --> PR[PR comment + link]
classDef ci fill:#d1fae5,stroke:#16a34a,color:#111
classDef art fill:#fef9c3,stroke:#f59e0b,color:#111
class J1,J2,J3,J4,MERGE ci
class UP,ART,PR art
Next step. Open the
Advance Playwright framework doc to see all ten command
clusters wired into one folder-by-folder reference project, with a real GitHub Actions sharded
workflow on top.