Curriculum . Playwright DevOps . Sharding

Playwright sharding - parallel containers and blob merge

How --shard=N/M works under the hood, why "1 container = 1 shard" is the pattern that scales, the math of total time vs container count, the blob reporter format, and the merge step that stitches a single HTML report together. Worked example: TTACart 12-test suite sharded 4 ways in GitHub Actions.

Speedup ceiling

12 tests, 4 shards - in the ideal case, ~25% of single-runner time.

Diagrams

3 mermaid + 2 hand-styled inline SVGs.

Drills

Local 2-shard run through to a full matrix workflow.

blob

Reporter

Compact intermediate format - one file per shard, merged at the end.

01`--shard=N/M` mechanics DevOps - shard/01

Playwright's sharding is deterministic and stateless. You pass --shard=N/M where N is the shard you are running (1-indexed) and M is the total number of shards. Playwright sorts every test file by path, partitions the list into M buckets, then runs only bucket N in this process.

npx playwright test --shard=1/4 - first quarter of tests
npx playwright test --shard=2/4 - second quarter
npx playwright test --shard=3/4 - third
npx playwright test --shard=4/4 - fourth

Two key properties

Deterministic - shard 2 of 4 always picks the same tests, regardless of which machine runs it. This is what makes the pattern safe in CI.
No cross-shard communication - shards do not share state, do not coordinate. They produce a partial report. Merging is a separate step at the end.

flowchart TD
  A[12 test files sorted by path] --> B{shard=N/M flag}
  B -->|--shard=1/4| S1[Files 1-3]
  B -->|--shard=2/4| S2[Files 4-6]
  B -->|--shard=3/4| S3[Files 7-9]
  B -->|--shard=4/4| S4[Files 10-12]
  S1 --> R1[blob-report-1.zip]
  S2 --> R2[blob-report-2.zip]
  S3 --> R3[blob-report-3.zip]
  S4 --> R4[blob-report-4.zip]
  R1 --> M[Merge step]
  R2 --> M
  R3 --> M
  R4 --> M
  M --> H[playwright-report - html]

Sharding splits the test list deterministically; merge stitches the four partial reports back together.

Granularity: sharding splits by test file, not by individual test() calls. If one file has 30 tests and another has 1, the split is uneven. Spread tests evenly across files for best balance.

021 container = 1 shard pattern canonical

The pattern that scales is: run each shard in its own container, on its own CI runner. Four shards means four parallel containers, each running ~25% of the suite. The benefits compound:

Isolation - one shard crashing the browser does not bring down the others.
Linear scaling - add a fifth shard, get a fifth runner, see ~20% faster wall-clock time.
Free parallelism on CI - GitHub Actions, GitLab, CircleCI all bill per concurrent runner. 4 shards x 10 mins is the same cost as 1 shard x 40 mins, but the user waits 10 minutes instead of 40.

Anti-pattern: multiple shards on one container

Running --shard=1/4 and --shard=2/4 simultaneously on the same container does not help. Playwright already parallelises across CPU cores within a single run (controlled by workers in the config). Two shards on one box compete for the same cores. Spread to two boxes instead.

03Total time vs N-containers math svg 1/2

The wall-clock math is simple but the pull-time overhead matters. For a 12-test suite where each test averages 60 seconds:

Shards	Tests / shard	Per-shard time	Wall-clock	Pull cost	Net win
1	12	~12 min	~12 min	~40s (1 pull)	baseline
2	6	~6 min	~6 min	~40s x 2 in parallel	~6 min saved
4	3	~3 min	~3 min	~40s x 4 in parallel	~9 min saved
8	1-2	~1-2 min	~2 min	~40s x 8 in parallel	~10 min saved
16	0-1	0-60s	~1.5 min	~40s x 16	marginal - pull dominates

Wall-clock collapse from ~12 min to ~4 min when you split a 12-test suite into 4 parallel shards + a merge.

Diminishing returns kick in fast. Beyond 8 shards the pull cost dominates and the per-shard time is too short to amortise it. The sweet spot for most suites is 4-8.

04Blob reporter format

The HTML reporter is for humans. The blob reporter is for machines. It is Playwright's compact intermediate format - a zip containing every test result, attachment, trace, and screenshot the shard produced. The merge tool reads N blob zips and writes one HTML.

run with blob reporter

npx playwright test --shard=1/4 --reporter=blob

# Default output: blob-report/report-1.zip
# You can pin a name with PWBLOB_NAME env var.

Each blob zip contains:

A JSONL stream of test events (start, end, attachments).
The trace zips, video webms, and screenshot PNGs referenced by those events.
A manifest with the shard number and total.

The blob format is stable across Playwright minor versions. If shard 1 ran on v1.49.0 and shard 2 on v1.49.1, the merge still works - but pin to the same major.minor for safety.

05Merge - `npx playwright merge-reports` mermaid 2/3

The merge runs after all shards complete. It accepts a directory full of blob zips and produces any reporter output you ask for - HTML, JSON, JUnit XML, GitHub Actions annotations. Most teams produce HTML for humans and JUnit XML for CI ingestion.

stateDiagram-v2
  [*] --> Shard1Running
  [*] --> Shard2Running
  [*] --> Shard3Running
  [*] --> Shard4Running
  Shard1Running --> Shard1Blob: writes blob-report-1.zip
  Shard2Running --> Shard2Blob: writes blob-report-2.zip
  Shard3Running --> Shard3Blob: writes blob-report-3.zip
  Shard4Running --> Shard4Blob: writes blob-report-4.zip
  Shard1Blob --> Upload: upload-artifact action
  Shard2Blob --> Upload
  Shard3Blob --> Upload
  Shard4Blob --> Upload
  Upload --> MergeJob: download-artifact (all)
  MergeJob --> HtmlReport: merge-reports --reporter=html
  HtmlReport --> [*]

Lifecycle of blob files - written by shards, uploaded as artifacts, downloaded and merged in a final job.

merge command

# All blob zips collected under ./all-blob-reports/
npx playwright merge-reports ./all-blob-reports \
  --reporter=html

# Multi-output: html + junit + GitHub annotations
npx playwright merge-reports ./all-blob-reports \
  --reporter=html,junit,github

The merged HTML has every test in one place, marks reruns and retries correctly, and shows attachments inline.

06GitHub Actions matrix fanout mermaid 3/3

Matrix is GitHub's way of saying "run this same job once per element in a list". Combined with sharding, it gives you the 1-container-1-shard pattern for free.

flowchart LR
  PUSH[push / pull_request] --> WF[workflow file]
  WF --> JOB[job: test]
  JOB --> M[strategy.matrix.shard]
  M --> S1[runner 1 - shard 1/4]
  M --> S2[runner 2 - shard 2/4]
  M --> S3[runner 3 - shard 3/4]
  M --> S4[runner 4 - shard 4/4]
  S1 --> A1[upload blob-1]
  S2 --> A2[upload blob-2]
  S3 --> A3[upload blob-3]
  S4 --> A4[upload blob-4]
  A1 --> MERGE[merge-reports job]
  A2 --> MERGE
  A3 --> MERGE
  A4 --> MERGE
  MERGE --> PR[playwright-report]

Matrix fans out one job into N runners. Each writes a blob, the final job merges.

.github/workflows/playwright.yml

name: Playwright
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - run: npx playwright install --with-deps chromium
      - run: "npx playwright test --shard=${{ matrix.shard }}/4 --reporter=blob"
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: "blob-report-${{ matrix.shard }}"
          path: blob-report
          retention-days: 3

  merge:
    needs: test
    if: always()
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true
      - run: npx playwright merge-reports --reporter=html ./all-blob-reports
      - uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report

07TTACart 4-way shard demo svg 2/2

A real shape: the TTACart suite has 12 tests across 12 files. We shard 4 ways. Three tests land in each shard. Each runs ~3 minutes. Wall-clock drops from 12 minutes to a little under 4.

Shard distribution for a 12-spec TTACart suite, 4-way shard. Wall-clock falls from ~12 min to ~3.2 min + ~30s merge.

08Fail-fast trade-offs strategy

GitHub matrix has a fail-fast flag. When true (the default), the moment one shard fails, GitHub cancels every other shard in the matrix. When false, every shard runs to completion. Both make sense in different contexts.

Setting	Behaviour	Use when
`fail-fast: true`	Cancel siblings on first failure	Pull request gates - want fast feedback that "something is broken"
`fail-fast: false`	All shards run, every failure surfaces	Nightly suites - want the full picture of what is broken
`continue-on-error: true`	Individual shard failures do not fail the workflow	Soft-failing flaky shards while you triage

Common mistake: setting both fail-fast: false AND continue-on-error: true. The workflow now reports green even when tests fail, because nothing escalates. Pick one strategy and stick with it.

09Aggregation pitfalls gotchas

Artifact name collisions - if every shard uploads to blob-report, the second upload overwrites the first. Always include ${{ matrix.shard }} in the artifact name.
Run name confusion - both shard runs and the merge job appear in the Actions UI. Set name: on the merge job so you can find the merged report quickly.
Lock-file drift - if shard 1 installs from a different lockfile than shard 2 (e.g. cache miss), the dep versions can diverge. Always npm ci, never npm install, in CI.
Trace size - trace zips can be tens of MB per test. With 4 shards x 20 tests x 5 MB you ship ~400 MB of artifacts. Set retention-days low (3-7) or use trace: 'retain-on-failure' instead of 'on'.
Reading test file load order assumption - sharding splits on sorted file paths. Renaming a file rebalances the shards. If you depend on a specific test running in shard 3, you are doing it wrong - shards are interchangeable by design.
Browser install per shard - npx playwright install downloads ~200 MB per shard if not cached. Cache the ~/.cache/ms-playwright directory across shards.

D1Drill 1 - Run a 2-shard split locally

In a TTACart test project with 8+ tests, run --shard=1/2 in one terminal and --shard=2/2 in another. Confirm the tests are partitioned (each terminal runs ~half the suite). Diff the test file list each terminal touched.

Hint

Watch the "running" log lines. The path list should not overlap between terminals.

D2Drill 2 - Add the blob reporter

Pass --reporter=blob to both shard runs. Confirm two zip files appear under blob-report/. Open one with unzip -l and inspect the structure (JSONL + attachments).

Hint

Each blob includes a manifest telling the merger which shard produced it. No tooling needed - the merge handles it.

D3Drill 3 - Merge the two blobs

Move both blob zips into ./all-blob-reports/. Run npx playwright merge-reports --reporter=html ./all-blob-reports. Open playwright-report/index.html and confirm every test from both shards is listed.

Hint

If the merge complains about "no blob reports", you probably copied directories instead of files. The zips must live directly under the input directory.

D4Drill 4 - Wire up the GH Actions matrix

Push the workflow YAML from section 6 to a fresh repo. Trigger it on a push. Confirm four parallel runners spin up, each pulls a blob, and a fifth job merges everything into a single artifact.

Hint

You can watch the matrix expand in the Actions UI. Each shard appears as test (1), test (2), etc.

D5Drill 5 - Force-fail one shard, observe behaviour

Add a deliberately failing assertion to a test in shard 2. Run the workflow with fail-fast: true and then fail-fast: false. Note the difference - the first cancels the other shards, the second lets them finish.

Hint

Cancel signal looks like "The job was cancelled because of a failure in another matrix job."

D6Drill 6 - Cache the browser install across shards

Add an actions/cache@v4 step keyed on the Playwright version. Verify the second run hits the cache and the install --with-deps step finishes in <5 seconds instead of ~40.

Hint

Cache path: ~/.cache/ms-playwright. Cache key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}.

SSolution snippets

Local 4-shard run + merge

local-shard.sh

# Run in four terminals in parallel
PWBLOB_NAME=report-1 npx playwright test --shard=1/4 --reporter=blob
PWBLOB_NAME=report-2 npx playwright test --shard=2/4 --reporter=blob
PWBLOB_NAME=report-3 npx playwright test --shard=3/4 --reporter=blob
PWBLOB_NAME=report-4 npx playwright test --shard=4/4 --reporter=blob

# Collect all blob zips into one directory
mkdir -p all-blob-reports
mv blob-report/*.zip all-blob-reports/

# Merge
npx playwright merge-reports ./all-blob-reports --reporter=html
open playwright-report/index.html

Inspect a blob zip

inspect-blob.sh

unzip -l blob-report/report-1.zip

# Output (excerpt):
#  report.jsonl   (event stream)
#  resources/.zip   (trace per test)
#  resources/.png   (screenshot)
#  resources/.webm  (video)

GitHub Actions matrix - full workflow

.github/workflows/playwright.yml

name: Playwright
on: [push, pull_request]
jobs:
  test:
    name: "Shard ${{ matrix.shard }} / 4"
    runs-on: ubuntu-22.04
    strategy:
      fail-fast: false
      matrix: { shard: [1, 2, 3, 4] }
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'npm' }
      - run: npm ci
      - name: Cache browsers
        uses: actions/cache@v4
        with:
          path: ~/.cache/ms-playwright
          key: "pw-${{ runner.os }}-${{ hashFiles('package-lock.json') }}"
      - run: npx playwright install --with-deps chromium
      - run: "npx playwright test --shard=${{ matrix.shard }}/4 --reporter=blob"
      - if: always()
        uses: actions/upload-artifact@v4
        with:
          name: "blob-report-${{ matrix.shard }}"
          path: blob-report
          retention-days: 3

  merge:
    name: Merge reports
    needs: test
    if: always()
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'npm' }
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true
      - run: npx playwright merge-reports --reporter=html ./all-blob-reports
      - uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report
          retention-days: 14

Config aware of CI sharding

playwright.config.ts

import { defineConfig } from "@playwright/test";

export default defineConfig({
  testDir: "./tests",
  fullyParallel: true,
  // Reduce workers in CI - each shard already runs on its own runner
  workers: process.env.CI ? 1 : undefined,
  retries: process.env.CI ? 2 : 0,
  reporter: process.env.CI
    ? [["blob"], ["github"]]
    : [["list"], ["html", { open: "never" }]],
  use: {
    baseURL: "https://app.thetestingacademy.com/playwright/ttacart/",
    trace: "retain-on-failure",
    screenshot: "only-on-failure",
    video: "retain-on-failure",
  },
});

TTACart 4-way shard - one of the 12 specs

tests/products-list.spec.ts

import { test, expect } from "@playwright/test";

test.describe("product list", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto("/products");
  });

  test("shows at least 8 products", async ({ page }) => {
    const rows = page.getByTestId("product-row");
    await expect.poll(() => rows.count()).toBeGreaterThanOrEqual(8);
  });

  test("clicking a product opens detail page", async ({ page }) => {
    await page.getByTestId("product-row").first().click();
    await expect(page.getByTestId("product-detail")).toBeVisible();
  });
});

Java equivalent - JUnit 5 + Maven Surefire

Playwright for Java does not ship a built-in shard flag the way the TS test runner does. Teams sharding Java suites tend to split by JUnit 5 tags or by Maven Surefire includes:

pom.xml (excerpt)

<plugin>
  <artifactId>maven-surefire-plugin</artifactId>
  <configuration>
    <includes>
      <!-- Pass via -Dshard=N/M in the CI step -->
      <include>**/Shard${shard}*.java</include>
    </includes>
  </configuration>
</plugin>

For most Java teams, the cleanest path is to keep the Playwright TS runner for sharding and call Java services from there, or to use a third-party JUnit 5 shard plugin.

Playwright sharding - parallel containers and blob merge

01--shard=N/M mechanics DevOps - shard/01

Two key properties

021 container = 1 shard pattern canonical

Anti-pattern: multiple shards on one container

03Total time vs N-containers math svg 1/2

04Blob reporter format

05Merge - npx playwright merge-reports mermaid 2/3

06GitHub Actions matrix fanout mermaid 3/3

07TTACart 4-way shard demo svg 2/2

08Fail-fast trade-offs strategy

09Aggregation pitfalls gotchas

D1Drill 1 - Run a 2-shard split locally

D2Drill 2 - Add the blob reporter

D3Drill 3 - Merge the two blobs

D4Drill 4 - Wire up the GH Actions matrix

D5Drill 5 - Force-fail one shard, observe behaviour

D6Drill 6 - Cache the browser install across shards

SSolution snippets

Local 4-shard run + merge

Inspect a blob zip

GitHub Actions matrix - full workflow

Config aware of CI sharding

TTACart 4-way shard - one of the 12 specs

Java equivalent - JUnit 5 + Maven Surefire

01`--shard=N/M` mechanics DevOps - shard/01

05Merge - `npx playwright merge-reports` mermaid 2/3