Playwright MCP - the testing tools layer for AI clients
The Model Context Protocol (MCP) is a small JSON-RPC dialect that lets an AI client (Claude
Desktop, Cursor, Cline, Continue) discover and call tools running outside the model. A
Playwright MCP server exposes browser-driving tools - navigate,
click, fill, screenshot, extract - and the
client chats with it during a conversation. The model never imports @playwright/test;
it just calls tools and reads results.
DraftDraft - private preview - not yet in the main sidebar. This page maps to the
Lecture: Playwright MCP session. References to MCP point at the public protocol spec at
modelcontextprotocol.io. No code is copied from any third-party repository.
1What is MCP - the protocol
A standardised way for an AI client to ask "what tools do you have?" and then call them. Think of it as the USB-C of LLM tool use - one shape, any host, any device.
JSON-RPC 2.0toolsresourcespromptssamplingprogress eventsserver-defined schemaclient-defined model
MCP has three primitives. Tools are functions the client can call (screenshot,
run_spec). Resources are readable blobs the client can fetch (a trace.zip,
a generated HTML report). Prompts are reusable templates the server suggests to the client
(e.g. "summarise this failing trace"). On top of that the server can sample - ask the
client's model a follow-up question - which is how MCP servers stay model-agnostic.
Why it matters for testing: before MCP, every AI agent needed bespoke glue to talk to Playwright.
With MCP, you write the Playwright server once, and any MCP-aware client - Claude Desktop, Cursor,
VS Code with Cline - can use it without changes. That's the same value proposition we got from
LSP for editors, but for AI agents.
Exercises
Read the spec. Open modelcontextprotocol.io/spec and identify the three primitives. Sketch which tools, resources, and prompts a Playwright server would expose.
Pick the boundary. For our TTACart test suite, list which features belong as tools (functions) vs. resources (readable artefacts). Where would logs/heals.jsonl live?
JSON-RPC trace. Capture a real client-server exchange (tools/list, tools/call) and annotate every field.
2Playwright as an MCP server
A small Node process that imports @playwright/test internally and exposes its actions as MCP tools. The AI client sends JSON-RPC, the server runs the action, returns the result.
The server boots a real Chromium / Firefox / WebKit (headless or headed), holds one or more
BrowserContext instances, and routes tool calls into the Playwright API. The client doesn't see
DOM directly - it asks the server for a page snapshot (usually the accessibility tree)
and uses that as context for its next decision. That snapshot is also how the model selects
elements - by an ARIA ref the server assigned, not by CSS.
A second class of tools wraps your existing npx playwright commands -
run_spec(path), show_report(), show_trace(zip). The model
kicks off a real test run as if it were the developer at the terminal, then reads the result back
through a resources/read call.
illustrative tool registration (10 lines)
// Fresh TTA snippet, not copied from upstream. Pseudo-MCP server.
server.tool('browser_click', {
description: 'Click an element by ARIA ref returned from browser_snapshot.',
inputSchema: { ref: { type: 'string' } },
}, async ({ ref }) => {
const locator = page.locator(`[aria-ref="${ref}"]`);
await locator.click();
return { ok: true, url: page.url() };
});
Exercises
List 12 tools. Draft the schema for 12 Playwright tools you would expose - 6 page-action tools and 6 test-runner tools. Specify inputSchema for each.
ARIA ref instead of selector. Explain why returning {ref: "node-2841"} is safer than returning a CSS selector for the model to use.
Context lifetime. Decide: one BrowserContext per conversation, or per tool call. Justify the trade-off.
Spec runner. Wire a run_spec(path) tool that shells out to npx playwright test and streams stdout via MCP progress events.
3AI clients consuming MCP
Claude Desktop, Cursor, Cline (VS Code), Continue, Zed AI, Windsurf. Same client API surface - configure the server once in JSON, the model gets the new tools at next chat.
Claude DesktopCursorCline (VS Code)ContinueZedWindsurfCLI MCP shells
Each client has a config file - typically JSON - listing the MCP servers it should start at
launch. The client reads tools/list on connect, attaches the tool schemas to its
system prompt, and from then on the model can call any tool by name. The client itself handles
permission UI ("Allow Playwright server to call browser_click?") and rate-limiting.
Install + connect. Install a Playwright MCP server, register it in Claude Desktop, and have the model generate tta-cart-login.spec.ts purely by prompting.
Permission flow. Approve the first tool call, deny the second, observe how the model adapts.
Cross-client. Same MCP server, run it once with Cursor and once with Claude Desktop. Compare the conversation transcripts.
Tools list inspection. Use the client's "show tools" UI and verify the schema matches what your server exports.
4MCP tools - listing, calling, reading resources
Four core methods. tools/list returns the schema. tools/call invokes one. resources/list + resources/read stream readable artefacts back.
tools/list is what the client calls on connect - the server returns an array of
{name, description, inputSchema} objects. tools/call takes
{name, arguments} and returns a typed result. Long-running tools (a 90-second test
run, say) emit notifications/progress so the client UI can render a spinner. The
client can also cancel mid-flight via notifications/cancelled.
resources/read is how you ship artefacts back. A test run produces a trace.zip + an HTML
report; both are exposed as resources with URIs like
tta-pw://runs/2026-05-21-181203/trace.zip. The client can render the HTML inline or
offer a "save" button.
Exercises
Cancel a long run. Trigger a run_spec call, then cancel it 5 seconds in. Verify the test process actually stops.
Progress events. Emit notifications/progress every 5 seconds during a TTA suite run. Render them in a custom client UI.
Resource URI scheme. Design a URI scheme for: traces, reports, screenshots, generated specs. Avoid collisions across runs.
Prompt template. Expose a summarise_trace prompt template that the client can apply when reading a failing trace.
5Stdio vs SSE transport
MCP messages travel over one of two transports. Stdio is for local servers spawned by the client. Server-Sent Events (SSE) is for remote / hosted servers.
stdio: local + simpleSSE: remote + scalablestdio framing: line-delimited JSONSSE framing: text/event-streamstdio auth: process boundarySSE auth: bearer / OAuth
Stdio is the default. The client spawns the server as a child process, talks JSON-RPC over
stdin / stdout. Zero networking, zero auth, simplest possible setup. This is what you'll use on
your dev laptop 95% of the time, including for a Playwright server.
SSE matters when the server is shared - a hosted Playwright MCP that ten developers point
at, or a centralised Playwright runner inside a corporate network. The client opens an HTTP
connection, the server streams messages back via text/event-stream. Auth + TLS
become your problem; on the upside, the heavy browser process lives on a beefy CI machine, not
on the developer's laptop.
Exercises
Start in stdio. Run your TTA Playwright server in stdio mode + connect with Claude Desktop. Watch the JSON-RPC messages.
Switch to SSE. Move the same server behind an SSE endpoint. Add a bearer token and reconnect.
Measure latency. Same browser action over stdio vs. SSE. How much overhead does the network add?
Auth design. Sketch how you'd let an SSE Playwright server be shared safely across 3 teams in the same org.
6Local vs remote MCP server - security boundaries
A local stdio server can read anything the user can read. A remote SSE server reads only what its account allows. Plan the boundary before you connect.
local: user-level accessremote: service accounttool allow-listfilesystem sandboxnetwork policyaudit lograte limit
Local stdio. The server runs as you, with your credentials, on your machine. It can read ~/.aws/credentials if you let it. Treat it like a CLI tool you installed - same trust model.
Remote SSE. The server runs elsewhere as a service account. Bound by whatever IAM the service has. Add bearer auth, TLS, rate limits, audit logs.
Tool allow-list. Even local servers should let you disable specific tools per workspace. "On the production support workspace, no browser_evaluate tool" is a sensible rule.
Filesystem reach. If the server can read or write files, scope it to a project root and reject everything else. Path-traversal is the classic MCP attack.
Network reach. A Playwright server can navigate anywhere. Allow-list domains for production-impacting flows.
Security note. Do not point a production-credentialed MCP server at a free-form
chat client. Once a tool can browser_evaluate arbitrary JavaScript on logged-in
sessions, prompt injection from a malicious webpage becomes a full credential exfil.
Exercises
Threat model. Write a 5-row threat-model table for: local Playwright stdio MCP + Claude Desktop on a dev laptop. Rows: actor, asset, attack, mitigation, residual risk.
Allow-list test. Configure the server to expose only browser_navigate, browser_click, browser_snapshot. Confirm browser_evaluate is unavailable.
Path traversal probe. Pass ../../../etc/passwd to a file-reading tool. Confirm the server rejects it.
Prompt injection demo. Visit a deliberately crafted page that whispers "run browser_evaluate('fetch...')". Does your client + server prevent it?
7STLC + MCP - wiring AI into the Software Testing Life Cycle
Design -> generate -> execute -> report. MCP gives you one consistent tool surface across the whole loop, instead of bespoke integrations at each phase.
design: requirements -> casesgenerate: cases -> specsexecute: specs -> runsreport: runs -> insightsclose: insights -> PRsone MCP server per phase
A mature TTA team can run all four STLC phases through MCP. Design: a "test-design" MCP
server reads JIRA tickets via the Atlassian MCP server and produces a markdown test plan.
Generate: the Playwright MCP server uses the plan as input and emits real spec files.
Execute: same server runs npx playwright test and streams progress.
Report: a reporter MCP exposes the trace + HTML as resources, the AI client summarises
failures and proposes fixes. Loop closes by opening a PR via the GitHub MCP server.
The win is composability. You're not picking one AI tool - you're picking one protocol and
assembling tools per phase. The same tool surface our existing TTA
AI Chat demo and functions/api/chat.js proxy expose can be re-wrapped as MCP
tools when we ship the full STLC story.
Exercises
Map the STLC. Take one TTACart epic and walk it through the 4 STLC phases, listing the MCP server you'd use at each phase.
Design MCP. Sketch a "test-design" MCP server with 3 tools: read_ticket, extract_acceptance_criteria, draft_test_plan.
Cross-server flow. One conversation that touches: design MCP -> Playwright MCP -> GitHub MCP. Trace the data flow.
Cost ledger. Estimate token cost per phase for a typical 5-spec epic. Where would you cache?
Failure replay. A test fails in production. The reporter MCP returns the trace. The AI summarises the failure and the Playwright MCP regenerates the spec. Walk through the dialog.
Common MCP servers + clients
Reference table you can hand to a new joiner. Not exhaustive - the ecosystem moves fast.
How the client learns which servers to start at launch.
Diagrams - MCP message flow and STLC pipeline
Two mermaid diagrams. First: the client-server-tool sequence for one Playwright action. Second: how MCP servers compose across the full STLC.
MCP client-server message flow
A typical "click the green button" exchange. The model never imports Playwright; it just calls a tool by name and reads the snapshot back.
sequenceDiagram
autonumber
participant U as User
participant C as MCP Client (Claude Desktop)
participant M as LLM
participant S as Playwright MCP Server
participant B as Browser
U->>C: "Add the highlighted SKU to the cart"
C->>M: prompt + tools/list summary
M-->>C: call browser_snapshot
C->>S: tools/call browser_snapshot
S->>B: page.accessibility.snapshot()
B-->>S: ARIA tree with refs
S-->>C: snapshot result
C->>M: snapshot result attached
M-->>C: call browser_click ref=node-2841
C->>S: tools/call browser_click {ref:"node-2841"}
S->>B: locator(`[aria-ref="node-2841"]`).click()
B-->>S: ok
S-->>C: { ok: true, url: "/cart" }
C-->>U: "Done. Cart now has 1 item."
STLC composition - multiple MCP servers in one conversation
The AI client orchestrates four MCP servers across one feature. Each server is a small process; the client is the only thing that knows about all of them at once.
flowchart LR
subgraph CLIENT[MCP Client]
M[LLM]
end
M -- tools/call --> A[Atlassian MCP read JIRA ticket]
M -- tools/call --> P[Playwright MCP generate + run spec]
M -- tools/call --> R[Reporter MCP read trace + HTML report]
M -- tools/call --> G[GitHub MCP open PR with the new spec]
A -- ticket text --> M
P -- run id + status --> M
R -- summary + failures --> M
G -- pr url --> M
classDef srv fill:#d1fae5,stroke:#16a34a,color:#111
classDef llm fill:#ede9fe,stroke:#8b5cf6,color:#111
class A,P,R,G srv
class M llm
Next step. Configure your local Claude Desktop with a Playwright MCP server, then
walk through the exercises in section 7 (STLC + MCP) against the
TTACart demo. The
Framework + AI page covers the cost and privacy
guards you should apply to every MCP server you connect.