Private preview - DevOps chapter, sharding deep page.
Sidebar wiring lives in the parallel global pass. Not indexed.
Curriculum . Playwright DevOps . Sharding
Playwright sharding - parallel containers and blob merge
How --shard=N/M works under the hood, why "1 container = 1 shard" is the pattern that scales,
the math of total time vs container count, the blob reporter format, and the merge step that stitches a
single HTML report together. Worked example: TTACart 12-test suite sharded 4 ways in GitHub Actions.
4x
Speedup ceiling
12 tests, 4 shards - in the ideal case, ~25% of single-runner time.
5
Diagrams
3 mermaid + 2 hand-styled inline SVGs.
6
Drills
Local 2-shard run through to a full matrix workflow.
blob
Reporter
Compact intermediate format - one file per shard, merged at the end.
01--shard=N/M mechanics DevOps - shard/01
Playwright's sharding is deterministic and stateless. You pass --shard=N/M where N is the
shard you are running (1-indexed) and M is the total number of shards. Playwright sorts every test file
by path, partitions the list into M buckets, then runs only bucket N in this process.
npx playwright test --shard=1/4 - first quarter of tests
npx playwright test --shard=2/4 - second quarter
npx playwright test --shard=3/4 - third
npx playwright test --shard=4/4 - fourth
Two key properties
Deterministic - shard 2 of 4 always picks the same tests, regardless of which machine runs it. This is what makes the pattern safe in CI.
No cross-shard communication - shards do not share state, do not coordinate. They produce a partial report. Merging is a separate step at the end.
flowchart TD
A[12 test files sorted by path] --> B{shard=N/M flag}
B -->|--shard=1/4| S1[Files 1-3]
B -->|--shard=2/4| S2[Files 4-6]
B -->|--shard=3/4| S3[Files 7-9]
B -->|--shard=4/4| S4[Files 10-12]
S1 --> R1[blob-report-1.zip]
S2 --> R2[blob-report-2.zip]
S3 --> R3[blob-report-3.zip]
S4 --> R4[blob-report-4.zip]
R1 --> M[Merge step]
R2 --> M
R3 --> M
R4 --> M
M --> H[playwright-report - html]
Sharding splits the test list deterministically; merge stitches the four partial reports back together.
Granularity: sharding splits by test file, not by individual test() calls. If one file has 30 tests and another has 1, the split is uneven. Spread tests evenly across files for best balance.
021 container = 1 shard pattern canonical
The pattern that scales is: run each shard in its own container, on its own CI runner. Four shards means
four parallel containers, each running ~25% of the suite. The benefits compound:
Isolation - one shard crashing the browser does not bring down the others.
Linear scaling - add a fifth shard, get a fifth runner, see ~20% faster wall-clock time.
Free parallelism on CI - GitHub Actions, GitLab, CircleCI all bill per concurrent runner. 4 shards x 10 mins is the same cost as 1 shard x 40 mins, but the user waits 10 minutes instead of 40.
Anti-pattern: multiple shards on one container
Running --shard=1/4 and --shard=2/4 simultaneously on the same container does
not help. Playwright already parallelises across CPU cores within a single run (controlled by
workers in the config). Two shards on one box compete for the same cores. Spread to two
boxes instead.
03Total time vs N-containers math svg 1/2
The wall-clock math is simple but the pull-time overhead matters. For a 12-test suite where each test
averages 60 seconds:
Shards
Tests / shard
Per-shard time
Wall-clock
Pull cost
Net win
1
12
~12 min
~12 min
~40s (1 pull)
baseline
2
6
~6 min
~6 min
~40s x 2 in parallel
~6 min saved
4
3
~3 min
~3 min
~40s x 4 in parallel
~9 min saved
8
1-2
~1-2 min
~2 min
~40s x 8 in parallel
~10 min saved
16
0-1
0-60s
~1.5 min
~40s x 16
marginal - pull dominates
Wall-clock collapse from ~12 min to ~4 min when you split a 12-test suite into 4 parallel shards + a merge.
Diminishing returns kick in fast. Beyond 8 shards the pull cost dominates and the per-shard time is too short to amortise it. The sweet spot for most suites is 4-8.
04Blob reporter format
The HTML reporter is for humans. The blob reporter is for machines. It is Playwright's compact
intermediate format - a zip containing every test result, attachment, trace, and screenshot the shard
produced. The merge tool reads N blob zips and writes one HTML.
run with blob reporter
npx playwright test --shard=1/4 --reporter=blob
# Default output: blob-report/report-1.zip# You can pin a name with PWBLOB_NAME env var.
Each blob zip contains:
A JSONL stream of test events (start, end, attachments).
The trace zips, video webms, and screenshot PNGs referenced by those events.
A manifest with the shard number and total.
The blob format is stable across Playwright minor versions. If shard 1 ran on v1.49.0 and shard 2 on v1.49.1, the merge still works - but pin to the same major.minor for safety.
05Merge - npx playwright merge-reportsmermaid 2/3
The merge runs after all shards complete. It accepts a directory full of blob zips and produces any
reporter output you ask for - HTML, JSON, JUnit XML, GitHub Actions annotations. Most teams produce HTML
for humans and JUnit XML for CI ingestion.
Lifecycle of blob files - written by shards, uploaded as artifacts, downloaded and merged in a final job.
merge command
# All blob zips collected under ./all-blob-reports/
npx playwright merge-reports ./all-blob-reports \
--reporter=html
# Multi-output: html + junit + GitHub annotations
npx playwright merge-reports ./all-blob-reports \
--reporter=html,junit,github
The merged HTML has every test in one place, marks reruns and retries correctly, and shows attachments inline.
06GitHub Actions matrix fanout mermaid 3/3
Matrix is GitHub's way of saying "run this same job once per element in a list". Combined with sharding,
it gives you the 1-container-1-shard pattern for free.
A real shape: the TTACart suite has 12 tests across 12 files. We shard 4 ways. Three tests land in each
shard. Each runs ~3 minutes. Wall-clock drops from 12 minutes to a little under 4.
Shard distribution for a 12-spec TTACart suite, 4-way shard. Wall-clock falls from ~12 min to ~3.2 min + ~30s merge.
08Fail-fast trade-offs strategy
GitHub matrix has a fail-fast flag. When true (the default), the moment one shard fails,
GitHub cancels every other shard in the matrix. When false, every shard runs to completion. Both make
sense in different contexts.
Setting
Behaviour
Use when
fail-fast: true
Cancel siblings on first failure
Pull request gates - want fast feedback that "something is broken"
fail-fast: false
All shards run, every failure surfaces
Nightly suites - want the full picture of what is broken
continue-on-error: true
Individual shard failures do not fail the workflow
Soft-failing flaky shards while you triage
Common mistake: setting both fail-fast: false AND continue-on-error: true. The workflow now reports green even when tests fail, because nothing escalates. Pick one strategy and stick with it.
09Aggregation pitfalls gotchas
Artifact name collisions - if every shard uploads to blob-report, the second upload overwrites the first. Always include ${{ matrix.shard }} in the artifact name.
Run name confusion - both shard runs and the merge job appear in the Actions UI. Set name: on the merge job so you can find the merged report quickly.
Lock-file drift - if shard 1 installs from a different lockfile than shard 2 (e.g. cache miss), the dep versions can diverge. Always npm ci, never npm install, in CI.
Trace size - trace zips can be tens of MB per test. With 4 shards x 20 tests x 5 MB you ship ~400 MB of artifacts. Set retention-days low (3-7) or use trace: 'retain-on-failure' instead of 'on'.
Reading test file load order assumption - sharding splits on sorted file paths. Renaming a file rebalances the shards. If you depend on a specific test running in shard 3, you are doing it wrong - shards are interchangeable by design.
Browser install per shard - npx playwright install downloads ~200 MB per shard if not cached. Cache the ~/.cache/ms-playwright directory across shards.
D1Drill 1 - Run a 2-shard split locally
In a TTACart test project with 8+ tests, run --shard=1/2 in one terminal and
--shard=2/2 in another. Confirm the tests are partitioned (each terminal runs ~half the
suite). Diff the test file list each terminal touched.
Hint
Watch the "running" log lines. The path list should not overlap between terminals.
D2Drill 2 - Add the blob reporter
Pass --reporter=blob to both shard runs. Confirm two zip files appear under
blob-report/. Open one with unzip -l and inspect the structure (JSONL + attachments).
Hint
Each blob includes a manifest telling the merger which shard produced it. No tooling needed - the merge handles it.
D3Drill 3 - Merge the two blobs
Move both blob zips into ./all-blob-reports/. Run
npx playwright merge-reports --reporter=html ./all-blob-reports. Open playwright-report/index.html
and confirm every test from both shards is listed.
Hint
If the merge complains about "no blob reports", you probably copied directories instead of files. The zips must live directly under the input directory.
D4Drill 4 - Wire up the GH Actions matrix
Push the workflow YAML from section 6 to a fresh repo. Trigger it on a push. Confirm four parallel runners
spin up, each pulls a blob, and a fifth job merges everything into a single artifact.
Hint
You can watch the matrix expand in the Actions UI. Each shard appears as test (1), test (2), etc.
D5Drill 5 - Force-fail one shard, observe behaviour
Add a deliberately failing assertion to a test in shard 2. Run the workflow with fail-fast: true
and then fail-fast: false. Note the difference - the first cancels the other shards, the second
lets them finish.
Hint
Cancel signal looks like "The job was cancelled because of a failure in another matrix job."
D6Drill 6 - Cache the browser install across shards
Add an actions/cache@v4 step keyed on the Playwright version. Verify the second run hits the
cache and the install --with-deps step finishes in <5 seconds instead of ~40.
# Run in four terminals in parallel
PWBLOB_NAME=report-1 npx playwright test --shard=1/4 --reporter=blob
PWBLOB_NAME=report-2 npx playwright test --shard=2/4 --reporter=blob
PWBLOB_NAME=report-3 npx playwright test --shard=3/4 --reporter=blob
PWBLOB_NAME=report-4 npx playwright test --shard=4/4 --reporter=blob
# Collect all blob zips into one directory
mkdir -p all-blob-reports
mv blob-report/*.zip all-blob-reports/
# Merge
npx playwright merge-reports ./all-blob-reports --reporter=html
open playwright-report/index.html
Playwright for Java does not ship a built-in shard flag the way the TS test runner does. Teams sharding Java suites tend to split by JUnit 5 tags or by Maven Surefire includes:
pom.xml (excerpt)
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<includes>
<!-- Pass via -Dshard=N/M in the CI step -->
<include>**/Shard${shard}*.java</include>
</includes>
</configuration>
</plugin>
For most Java teams, the cleanest path is to keep the Playwright TS runner for sharding and call Java services from there, or to use a third-party JUnit 5 shard plugin.