Practice Learn Playwright CLI
Draft . Private preview
Concept reference . The npx playwright command surface

Playwright CLI - the day-to-day commands

The npx playwright CLI used to live as a separate playwright-cli package but is now bundled into the regular playwright npm package. It is the surface you'll touch every day - install browsers, generate code, run tests, replay traces, open reports, ship to CI. This page walks ten clusters of commands you should have in your hands, with TTA exercises for each. Every snippet targets our hosted practice apps.

Draft Draft - private preview - not yet in the main sidebar. This page accompanies the Lecture: Playwright CLI session. All commands are from the public Playwright CLI surface; no project-specific code is copied from any upstream repository.

1First-day CLI - init, install, test, show-report

Four commands that get a brand-new project from zero to a green test against TTACart. Memorise these; you'll type them every time you bootstrap a new repo.

npm init playwright@latest npx playwright install npx playwright test npx playwright show-report --with-deps --list
$npm init playwright@latestscaffold + ts config + sample spec
$npx playwright install--with-depsbrowsers + OS deps
$npx playwright testrun every spec, headless
$npx playwright show-reportopen the HTML report from the last run

npm init playwright@latest walks you through a wizard - TS vs JS, GitHub Actions yes/no, browsers chromium/firefox/webkit. The output is a complete repo: tests/, playwright.config.ts, tsconfig.json, and a .gitignore that excludes node_modules and test-results. The --with-deps flag on install installs OS-level libraries (helpful on Linux CI runners that don't ship X11 dependencies by default).

Exercises

  1. Bootstrap. Run the wizard in an empty directory, accept the defaults, then point the example spec at https://app.thetestingacademy.com/playwright/ttacart/ and get it green.
  2. Inspect dependencies. Run npx playwright install --dry-run --with-deps and list everything it would touch.
  3. Browser binary location. Find where Playwright cached the browser binaries on your OS (Linux / macOS / Windows). Why does this matter for CI cache keys?
  4. show-report freshness. Run two suites, run show-report twice. Which run is being displayed? How does Playwright pick?

2Codegen - record selectors + actions

A browser opens, you click around, Playwright writes the spec for you. Best for two things: learning the locator API (you see what selector Playwright would pick), and bootstrapping a draft spec against a brand-new page.

codegen <url> --target=typescript --output=tests/draft.spec.ts --device='iPhone 14' --save-storage=auth.json --load-storage=auth.json getByRole / getByLabel preferred
$npx playwright codegen https://app.thetestingacademy.com/playwright/ttacart/record + emit typescript
$npx playwright codegen --target=python https://...switch language
$npx playwright codegen --device='iPhone 14' https://...mobile emulation
$npx playwright codegen --save-storage=auth.json https://...capture cookies + localStorage

Codegen picks locators in a defined order: getByRole first, then getByLabel, then getByText, with CSS as a last resort. Treat the generated file as a draft - rename it, extract page objects, dedupe waits, fold into a real spec. It is not production code, it is a transcript of what you did.

TTA tip. Run codegen against TTACart at our public URL, not localhost. The locators codegen picks against our real DOM will translate cleanly to your test - the same DOM CI sees.

Exercises

  1. Record a login. Run npx playwright codegen https://app.thetestingacademy.com/playwright/ttacart/ to capture the TTACart login flow. Refactor the recording into a LoginPage POM.
  2. Mobile codegen. Re-record the same flow with --device='iPhone 14'. What changed in the locators?
  3. Auth storage. Use --save-storage=auth.json to capture a logged-in session, then write a fixture that uses storageState: 'auth.json'.
  4. Target language switch. Record one TTACart flow in Python, port it manually back to TypeScript. Note where Playwright's API differs across languages.

3Trace viewer - npx playwright show-trace

A standalone DevTools-like viewer for a Playwright trace.zip. Time-travel through every action, see the DOM at each step, replay network calls, watch the console. The single most useful tool for debugging flakes.

show-trace trace.zip trace.playwright.dev on-failure retain-on-failure action timeline network panel DOM snapshot
$npx playwright show-trace test-results/.../trace.zipopen viewer in a new window
$npx playwright show-trace --port=8765 trace.zippin the port
$npx playwright show-tracedrop a zip onto the page

Configure trace capture in playwright.config.ts via use: { trace: 'on-first-retry' }. on-first-retry is the production default (capture trace only when a retry runs) - it keeps your test-results folder small but still ships you the data you need when something goes wrong. The viewer also runs at trace.playwright.dev if you want to share a trace with a teammate over Slack.

Exercises

  1. Capture + replay. Force a TTACart spec to fail (rename a button). Run with trace: 'on', open the trace, walk through the action timeline.
  2. Network panel. Find an XHR in the trace where the response was 4xx. Annotate the panel.
  3. DOM diff. Step from one action to the next and use the DOM snapshot panel to spot what changed. Where does the SKU price update?
  4. Trace + retain-on-failure. Configure retain-on-failure, run a green suite, run a red suite. Compare the size of test-results after each.

4Headed + debug mode

Run with a visible browser, step through with the Playwright Inspector, drop a page.pause() to break right before the spot you care about. The fastest local debugging loop in browser automation.

--headed --debug page.pause() PWDEBUG=1 --slowmo=500 --workers=1 with debug step over / step in
$npx playwright test --headed --workers=1visible browser, one at a time
$npx playwright test --debugopen Inspector, step through
$PWDEBUG=1 npx playwright testenv-var equivalent of --debug
$npx playwright test --headed --slowmo=300visual replay for screencasts

--debug opens the Playwright Inspector window alongside the test - you see the spec on the left, the browser on the right, and a step-over button at the top. Drop await page.pause() anywhere in your spec to set a breakpoint without restarting the run. --slowmo=500 inserts a half-second delay between every action - useful for recording screencasts, not for actual debugging (it just hides race conditions).

Exercises

  1. Inspector tour. Run any TTACart spec with --debug and use the Pick Locator button to inspect the cart's add-to-cart button. Compare to what you'd write by hand.
  2. Pause mid-flight. Drop page.pause() just before the assertion, inspect the DOM, then resume.
  3. slowmo vs awaits. Run a spec with --slowmo=1000 and notice that explicit waits in the spec disappear into the noise. Explain why slowmo masks races.
  4. Single worker rule. Try --debug with --workers=4. What does Playwright do? Why?

5Sharding + parallelism

Two knobs control how much of the suite runs at once. --workers=N sets in-process parallelism. --shard=k/n sets cross-process / cross-CI-job parallelism.

--workers=4 --workers=50% --shard=1/4 --shard=2/4 --shard=3/4 --shard=4/4 merge-reports
$npx playwright test --workers=44 in-process workers
$npx playwright test --workers=50%half the CPU cores
$npx playwright test --shard=1/4CI job 1 of 4
$npx playwright merge-reports ./all-blob-reportsmerge sharded blob reports

--workers is local parallelism inside one CLI invocation. Use it on your laptop, use it on a single CI runner. --shard=1/4 is for splitting the suite across four separate CI jobs - each job runs a quarter of the specs in parallel. You collect the four blob reports and run merge-reports to get a single combined HTML output.

TTA framework note. Our published Advance Playwright framework uses 4 shards by default in the GitHub Actions workflow. See its CI section for a concrete YAML.

Exercises

  1. Worker scale test. Run the TTACart suite with workers 1, 2, 4, 8 and plot total runtime. Where does it stop scaling?
  2. Shard split. Split your suite into 4 shards locally (in 4 terminals). Note that the spec assignment is deterministic.
  3. Merge reports. Produce 4 blob reports from 4 shard runs and merge them. Open the combined HTML output and confirm all specs are present.
  4. Worker leak. Add a spec that opens 10MB of memory and doesn't clean it up. Run with --workers=8. Watch RAM in top. What do you see?

6Filters - --grep, --project, --reporter

Run only the specs you care about. --grep filters by name, --project filters by browser / device, --reporter picks the output format.

--grep @smoke --grep-invert @flaky --project=chromium --project='Mobile Safari' --reporter=html,list --reporter=json --reporter=line
$npx playwright test --grep @smokeonly specs tagged @smoke
$npx playwright test --grep-invert @flakyskip flaky specs
$npx playwright test --project=chromiumone browser only
$npx playwright test --reporter=html,listhtml + live terminal
$npx playwright test --listprint what would run, run nothing

Tag specs with strings in the title - test('add to cart @smoke', ...) - then --grep '@smoke' selects them. --grep-invert is its mirror - useful for "run everything except known-flakes on a hotfix branch". --project matches the name field from playwright.config.ts's projects array, so you can ship a green chromium pipeline while iterating on a flaky firefox case.

Exercises

  1. Smoke set. Tag 3 TTACart specs as @smoke. Wire a CI job that only runs --grep @smoke on PRs.
  2. Invert grep. Tag 2 specs as @flaky. Run --grep-invert @flaky on main, the full suite on a nightly cron.
  3. Project matrix. Define 3 projects (chromium, firefox, webkit) and run only the project that's currently failing.
  4. Reporter combo. Try --reporter=html,json,line. What gets written where?

7UI mode - npx playwright test --ui

A long-running watcher that re-runs specs on file change, lets you time-travel through actions, edit + re-run inline, and pin watch on a single test. The replacement for "save, switch terminal, type, hit up-arrow".

--ui watch mode time-travel debugging pin tests DOM snapshots in panel live locator picker tag-by-tag filter
$npx playwright test --uiopen UI runner
$npx playwright test --ui-host=0.0.0.0 --ui-port=8081expose UI on the LAN
$npx playwright test --ui --project=chromiumUI scoped to one project

UI mode is the right default for local dev. You launch it once at the start of your morning, change a spec, hit Cmd-S, and Playwright re-runs only the affected test. Failed runs show the same trace viewer panels you'd get from show-trace - same time-travel, same DOM snapshots, same network panel - inline. Use the Pick Locator button to grab a selector from any DOM snapshot frame.

Exercises

  1. Watch loop. Open UI mode, edit one assertion in a TTACart spec, watch it re-run live.
  2. Pin a test. Pin one slow spec, edit only it, ignore the rest of the suite.
  3. Locator picker. Use the inline Pick Locator button on a TTACart cart row to grab the best locator. Paste it back into your spec.
  4. Tag filter. Filter to @smoke-tagged specs in the UI sidebar. Verify only those re-run on file change.
  5. Remote UI. Run UI mode in Docker with --ui-host=0.0.0.0 and access from your host. When would you ever do this?

8Updating snapshots - visual regression

Playwright captures screenshots and DOM snapshots; expect(page).toHaveScreenshot() compares against a baseline. When the UI legitimately changes, you regenerate the baselines with --update-snapshots.

expect.toHaveScreenshot --update-snapshots -u maxDiffPixels maxDiffPixelRatio snapshotPathTemplate testInfo.snapshotPath
$npx playwright test --update-snapshotsregenerate all baselines
$npx playwright test --update-snapshots --grep @visualscoped regen
$npx playwright test -ushort form

Visual regression is the most-mistaken use case for Playwright. Two rules: (1) update snapshots only when the UI change is intentional - and review the resulting diff in the PR like any other code change. (2) Cap maxDiffPixels so that a 1-pixel anti-aliasing difference doesn't fail your CI. Anti-aliasing differs between OS versions, so visual specs that pass on a dev Mac may fail on a Linux CI runner.

Exercises

  1. Capture a baseline. Add a expect(page).toHaveScreenshot() on the TTACart cart drawer. Run once to generate, commit, run again to verify.
  2. Force a diff. Tweak the cart row colour in DevTools, run the spec, see the diff in the report.
  3. Regenerate scoped. Use --update-snapshots --grep @visual to regen only a subset. Verify other baselines are untouched.
  4. Cross-OS diff. Capture baselines on macOS, run the spec on Linux (via Docker). Where do you need to bump maxDiffPixelRatio?

9Reports - html, json, allure-playwright

Playwright ships HTML, JSON, JUnit, line, list, dot, and blob reporters out of the box. Plus community add-ons (allure-playwright, ortoni-report). Pick the right reporter for the right consumer.

--reporter=html --reporter=list --reporter=line --reporter=json --reporter=junit --reporter=blob --reporter=allure-playwright merge-reports
$npx playwright test --reporter=htmlHTML report at playwright-report/
$npx playwright test --reporter=jsonmachine-readable run summary
$npx playwright test --reporter=junitfor Jenkins / GitHub Actions
$npx playwright test --reporter=blobshard input for merge-reports
$npx playwright merge-reports --reporter=html ./blob-reportsmerge shards into HTML

html for humans, json for dashboards, junit for CI surface (GitHub Actions picks junit up automatically and renders it in the Checks tab), blob for sharded runs that need merging. list / line / dot are terminal-only and useful in CI logs. Custom reporters live in our own framework - see the Advance framework page for the CustomTTAReporter.ts walkthrough.

Exercises

  1. Two reporters at once. Run with --reporter=html,junit and inspect both outputs.
  2. JSON consumer. Pipe --reporter=json into a small script that prints "N passed, M failed" - useful for Slack notifications.
  3. Allure setup. Wire allure-playwright, run a TTACart suite, generate the Allure HTML report.
  4. Custom reporter outline. Sketch a 30-line reporter that prints only failure URLs from a TTACart run. Look at the Reporter interface in the Playwright types.

10CI integration - GitHub Actions, Jenkins, Docker

Running the same CLI inside CI as you do locally. Three patterns: GitHub Actions matrix + sharding, Jenkins pipeline + parallel stages, and Docker image for any-runner reproducibility.

mcr.microsoft.com/playwright actions/setup-node npx playwright install --with-deps matrix.shard artifact upload cache .playwright-browsers CI=true env
$docker run --rm -it mcr.microsoft.com/playwright:v1.45.0-jammy bashofficial image with browsers preinstalled
$CI=1 npx playwright teststricter defaults - no retries-on-pass etc.
$npx playwright test --shard=${{matrix.shard}}/4inside a GitHub Actions matrix

The official mcr.microsoft.com/playwright image ships every browser preinstalled and pinned to a Playwright version. Match the image tag to your @playwright/test version - mismatches cause "executable doesn't exist at ..." errors at runtime. In GitHub Actions, a 4-shard matrix runs 4 parallel jobs that each upload a blob reporter artefact, then a final merge-reports job stitches them together.

TTA framework. Our Advance Playwright framework ships a full GitHub Actions workflow with 4-shard parallelism, browser caching, and merged HTML reports as artefacts.

Exercises

  1. 4-shard pipeline. Write a GitHub Actions workflow that runs your TTACart suite as 4 shards in parallel, merges the blob reports, and uploads the HTML as an artefact.
  2. Browser cache key. Cache ~/.cache/ms-playwright across runs keyed on Playwright version. Measure the time saved.
  3. Docker run. Run the entire suite inside the official Docker image on your laptop. Time the first run vs. the second.
  4. Jenkins parallel. Sketch a Jenkinsfile with 4 parallel stages, each running one shard.
  5. CI=1 effects. Run locally with CI=1 npx playwright test and read the docs for what changes. Why is retries different on CI?

Two mini-projects (scope only)

The lecture batch ships two CLI-focused projects. The directions are summarised here; the actual code lives in your own repo - this is the "what should I build" spec, not a copy of the upstream project.

Project 1

VWO-style login flow

Use codegen to record a login attempt against a public VWO sample form, refactor into a LoginPage POM with explicit waits, add a @smoke tag, run under --ui mode, capture a trace on failure, ship a green run inside the official Playwright Docker image.

Project 2

TTA Bank end-to-end flow

Build a bank-style transfer journey on top of our TTACart sandbox - login, transfer, statement assertion. Add 4 specs, shard across --shard=1/4 through --shard=4/4, merge the blob reports, generate an HTML report, wire a GitHub Actions matrix that runs the same shards on every push.

Diagrams - command tree and CI pipeline

Two mermaid diagrams. First: the npx playwright command tree, grouped by use case. Second: a sharded CI pipeline that uses 6 of the commands above.

npx playwright - command tree

Every CLI verb you'll touch, grouped by phase. Green = run-time, amber = author-time, violet = debug-time.

flowchart TB
  ROOT[npx playwright] --> SETUP[setup]
  ROOT --> AUTHOR[author]
  ROOT --> RUN[run]
  ROOT --> DEBUG[debug]

  SETUP --> S1[init]
  SETUP --> S2[install]
  SETUP --> S3[install --with-deps]

  AUTHOR --> A1[codegen url]
  AUTHOR --> A2[codegen --device]
  AUTHOR --> A3[codegen --save-storage]

  RUN --> R1[test]
  RUN --> R2[test --workers N]
  RUN --> R3[test --shard k/n]
  RUN --> R4[test --grep tag]
  RUN --> R5[test --project name]
  RUN --> R6[test --reporter list]
  RUN --> R7[merge-reports]

  DEBUG --> D1[test --headed]
  DEBUG --> D2[test --debug]
  DEBUG --> D3[test --ui]
  DEBUG --> D4[show-trace zip]
  DEBUG --> D5[show-report]

  classDef setup fill:#fef3c7,stroke:#f59e0b,color:#111
  classDef author fill:#fde68a,stroke:#d97706,color:#111
  classDef run fill:#d1fae5,stroke:#16a34a,color:#111
  classDef debug fill:#ede9fe,stroke:#8b5cf6,color:#111
  class S1,S2,S3 setup
  class A1,A2,A3 author
  class R1,R2,R3,R4,R5,R6,R7 run
  class D1,D2,D3,D4,D5 debug
            

Sharded CI pipeline

One push triggers four parallel shard jobs; each writes a blob report; a final job merges + publishes an HTML artefact. This is the shape our V1 framework ships.

flowchart LR
  PUSH[git push] --> CI[GitHub Actions]
  CI --> M[matrix.shard=1..4]

  M --> J1[Job 1
test --shard=1/4
--reporter=blob] M --> J2[Job 2
test --shard=2/4
--reporter=blob] M --> J3[Job 3
test --shard=3/4
--reporter=blob] M --> J4[Job 4
test --shard=4/4
--reporter=blob] J1 --> UP[upload artefact] J2 --> UP J3 --> UP J4 --> UP UP --> MERGE[merge-reports
--reporter=html] MERGE --> ART[final HTML artefact] ART --> PR[PR comment + link] classDef ci fill:#d1fae5,stroke:#16a34a,color:#111 classDef art fill:#fef9c3,stroke:#f59e0b,color:#111 class J1,J2,J3,J4,MERGE ci class UP,ART,PR art
Next step. Open the Advance Playwright framework doc to see all ten command clusters wired into one folder-by-folder reference project, with a real GitHub Actions sharded workflow on top.