Practice Playwright DevOps Sharding
DevOps
Draft
Private preview - DevOps chapter, sharding deep page.
Sidebar wiring lives in the parallel global pass. Not indexed.
Curriculum . Playwright DevOps . Sharding

Playwright sharding - parallel containers and blob merge

How --shard=N/M works under the hood, why "1 container = 1 shard" is the pattern that scales, the math of total time vs container count, the blob reporter format, and the merge step that stitches a single HTML report together. Worked example: TTACart 12-test suite sharded 4 ways in GitHub Actions.

4x
Speedup ceiling
12 tests, 4 shards - in the ideal case, ~25% of single-runner time.
5
Diagrams
3 mermaid + 2 hand-styled inline SVGs.
6
Drills
Local 2-shard run through to a full matrix workflow.
blob
Reporter
Compact intermediate format - one file per shard, merged at the end.

01--shard=N/M mechanics DevOps - shard/01

Playwright's sharding is deterministic and stateless. You pass --shard=N/M where N is the shard you are running (1-indexed) and M is the total number of shards. Playwright sorts every test file by path, partitions the list into M buckets, then runs only bucket N in this process.

  • npx playwright test --shard=1/4 - first quarter of tests
  • npx playwright test --shard=2/4 - second quarter
  • npx playwright test --shard=3/4 - third
  • npx playwright test --shard=4/4 - fourth

Two key properties

  • Deterministic - shard 2 of 4 always picks the same tests, regardless of which machine runs it. This is what makes the pattern safe in CI.
  • No cross-shard communication - shards do not share state, do not coordinate. They produce a partial report. Merging is a separate step at the end.
flowchart TD
  A[12 test files sorted by path] --> B{shard=N/M flag}
  B -->|--shard=1/4| S1[Files 1-3]
  B -->|--shard=2/4| S2[Files 4-6]
  B -->|--shard=3/4| S3[Files 7-9]
  B -->|--shard=4/4| S4[Files 10-12]
  S1 --> R1[blob-report-1.zip]
  S2 --> R2[blob-report-2.zip]
  S3 --> R3[blob-report-3.zip]
  S4 --> R4[blob-report-4.zip]
  R1 --> M[Merge step]
  R2 --> M
  R3 --> M
  R4 --> M
  M --> H[playwright-report - html]
              
Sharding splits the test list deterministically; merge stitches the four partial reports back together.

Granularity: sharding splits by test file, not by individual test() calls. If one file has 30 tests and another has 1, the split is uneven. Spread tests evenly across files for best balance.

021 container = 1 shard pattern canonical

The pattern that scales is: run each shard in its own container, on its own CI runner. Four shards means four parallel containers, each running ~25% of the suite. The benefits compound:

  • Isolation - one shard crashing the browser does not bring down the others.
  • Linear scaling - add a fifth shard, get a fifth runner, see ~20% faster wall-clock time.
  • Free parallelism on CI - GitHub Actions, GitLab, CircleCI all bill per concurrent runner. 4 shards x 10 mins is the same cost as 1 shard x 40 mins, but the user waits 10 minutes instead of 40.

Anti-pattern: multiple shards on one container

Running --shard=1/4 and --shard=2/4 simultaneously on the same container does not help. Playwright already parallelises across CPU cores within a single run (controlled by workers in the config). Two shards on one box compete for the same cores. Spread to two boxes instead.

03Total time vs N-containers math svg 1/2

The wall-clock math is simple but the pull-time overhead matters. For a 12-test suite where each test averages 60 seconds:

ShardsTests / shardPer-shard timeWall-clockPull costNet win
112~12 min~12 min~40s (1 pull)baseline
26~6 min~6 min~40s x 2 in parallel~6 min saved
43~3 min~3 min~40s x 4 in parallel~9 min saved
81-2~1-2 min~2 min~40s x 8 in parallel~10 min saved
160-10-60s~1.5 min~40s x 16marginal - pull dominates
Before - single runner 12 tests sequential ~12 minutes 0 min 12 min After - 4 shards in parallel shard 1/4 - 3 tests shard 2/4 - 3 tests shard 3/4 - 3 tests shard 4/4 - 3 tests merge + HTML 0 min 3 min 4 min 75% time saved for 4x runner cost
Wall-clock collapse from ~12 min to ~4 min when you split a 12-test suite into 4 parallel shards + a merge.

Diminishing returns kick in fast. Beyond 8 shards the pull cost dominates and the per-shard time is too short to amortise it. The sweet spot for most suites is 4-8.

04Blob reporter format

The HTML reporter is for humans. The blob reporter is for machines. It is Playwright's compact intermediate format - a zip containing every test result, attachment, trace, and screenshot the shard produced. The merge tool reads N blob zips and writes one HTML.

run with blob reporter
npx playwright test --shard=1/4 --reporter=blob

# Default output: blob-report/report-1.zip
# You can pin a name with PWBLOB_NAME env var.

Each blob zip contains:

  • A JSONL stream of test events (start, end, attachments).
  • The trace zips, video webms, and screenshot PNGs referenced by those events.
  • A manifest with the shard number and total.

The blob format is stable across Playwright minor versions. If shard 1 ran on v1.49.0 and shard 2 on v1.49.1, the merge still works - but pin to the same major.minor for safety.

05Merge - npx playwright merge-reports mermaid 2/3

The merge runs after all shards complete. It accepts a directory full of blob zips and produces any reporter output you ask for - HTML, JSON, JUnit XML, GitHub Actions annotations. Most teams produce HTML for humans and JUnit XML for CI ingestion.

stateDiagram-v2
  [*] --> Shard1Running
  [*] --> Shard2Running
  [*] --> Shard3Running
  [*] --> Shard4Running
  Shard1Running --> Shard1Blob: writes blob-report-1.zip
  Shard2Running --> Shard2Blob: writes blob-report-2.zip
  Shard3Running --> Shard3Blob: writes blob-report-3.zip
  Shard4Running --> Shard4Blob: writes blob-report-4.zip
  Shard1Blob --> Upload: upload-artifact action
  Shard2Blob --> Upload
  Shard3Blob --> Upload
  Shard4Blob --> Upload
  Upload --> MergeJob: download-artifact (all)
  MergeJob --> HtmlReport: merge-reports --reporter=html
  HtmlReport --> [*]
              
Lifecycle of blob files - written by shards, uploaded as artifacts, downloaded and merged in a final job.
merge command
# All blob zips collected under ./all-blob-reports/
npx playwright merge-reports ./all-blob-reports \
  --reporter=html

# Multi-output: html + junit + GitHub annotations
npx playwright merge-reports ./all-blob-reports \
  --reporter=html,junit,github

The merged HTML has every test in one place, marks reruns and retries correctly, and shows attachments inline.

06GitHub Actions matrix fanout mermaid 3/3

Matrix is GitHub's way of saying "run this same job once per element in a list". Combined with sharding, it gives you the 1-container-1-shard pattern for free.

flowchart LR
  PUSH[push / pull_request] --> WF[workflow file]
  WF --> JOB[job: test]
  JOB --> M[strategy.matrix.shard]
  M --> S1[runner 1 - shard 1/4]
  M --> S2[runner 2 - shard 2/4]
  M --> S3[runner 3 - shard 3/4]
  M --> S4[runner 4 - shard 4/4]
  S1 --> A1[upload blob-1]
  S2 --> A2[upload blob-2]
  S3 --> A3[upload blob-3]
  S4 --> A4[upload blob-4]
  A1 --> MERGE[merge-reports job]
  A2 --> MERGE
  A3 --> MERGE
  A4 --> MERGE
  MERGE --> PR[playwright-report]
              
Matrix fans out one job into N runners. Each writes a blob, the final job merges.
.github/workflows/playwright.yml
name: Playwright
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - run: npx playwright install --with-deps chromium
      - run: "npx playwright test --shard=${{ matrix.shard }}/4 --reporter=blob"
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: "blob-report-${{ matrix.shard }}"
          path: blob-report
          retention-days: 3

  merge:
    needs: test
    if: always()
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true
      - run: npx playwright merge-reports --reporter=html ./all-blob-reports
      - uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report

07TTACart 4-way shard demo svg 2/2

A real shape: the TTACart suite has 12 tests across 12 files. We shard 4 ways. Three tests land in each shard. Each runs ~3 minutes. Wall-clock drops from 12 minutes to a little under 4.

12 TTACart specs - file path sort order cart-add.spec.ts shard 1/4 cart-clear.spec.ts shard 1/4 cart-remove.spec.ts shard 1/4 checkout-coupon.spec.ts shard 2/4 checkout-pay.spec.ts shard 2/4 checkout-shipping.spec.ts shard 2/4 home-cards.spec.ts shard 3/4 login.spec.ts shard 3/4 logout.spec.ts shard 3/4 products-detail.spec.ts shard 4/4 products-filter.spec.ts shard 4/4 products-list.spec.ts Per-shard runtime shard 1: ~3 min shard 2: ~3.1 min shard 3: ~2.9 min shard 4: ~3.2 min Even split because 1 test = 1 file. Imbalance grows when files have variable counts.
Shard distribution for a 12-spec TTACart suite, 4-way shard. Wall-clock falls from ~12 min to ~3.2 min + ~30s merge.

08Fail-fast trade-offs strategy

GitHub matrix has a fail-fast flag. When true (the default), the moment one shard fails, GitHub cancels every other shard in the matrix. When false, every shard runs to completion. Both make sense in different contexts.

SettingBehaviourUse when
fail-fast: trueCancel siblings on first failurePull request gates - want fast feedback that "something is broken"
fail-fast: falseAll shards run, every failure surfacesNightly suites - want the full picture of what is broken
continue-on-error: trueIndividual shard failures do not fail the workflowSoft-failing flaky shards while you triage

Common mistake: setting both fail-fast: false AND continue-on-error: true. The workflow now reports green even when tests fail, because nothing escalates. Pick one strategy and stick with it.

09Aggregation pitfalls gotchas

  • Artifact name collisions - if every shard uploads to blob-report, the second upload overwrites the first. Always include ${{ matrix.shard }} in the artifact name.
  • Run name confusion - both shard runs and the merge job appear in the Actions UI. Set name: on the merge job so you can find the merged report quickly.
  • Lock-file drift - if shard 1 installs from a different lockfile than shard 2 (e.g. cache miss), the dep versions can diverge. Always npm ci, never npm install, in CI.
  • Trace size - trace zips can be tens of MB per test. With 4 shards x 20 tests x 5 MB you ship ~400 MB of artifacts. Set retention-days low (3-7) or use trace: 'retain-on-failure' instead of 'on'.
  • Reading test file load order assumption - sharding splits on sorted file paths. Renaming a file rebalances the shards. If you depend on a specific test running in shard 3, you are doing it wrong - shards are interchangeable by design.
  • Browser install per shard - npx playwright install downloads ~200 MB per shard if not cached. Cache the ~/.cache/ms-playwright directory across shards.