Practice Learn Playwright MCP
Draft . Private preview
Concept reference . Model Context Protocol

Playwright MCP - the testing tools layer for AI clients

The Model Context Protocol (MCP) is a small JSON-RPC dialect that lets an AI client (Claude Desktop, Cursor, Cline, Continue) discover and call tools running outside the model. A Playwright MCP server exposes browser-driving tools - navigate, click, fill, screenshot, extract - and the client chats with it during a conversation. The model never imports @playwright/test; it just calls tools and reads results.

Draft Draft - private preview - not yet in the main sidebar. This page maps to the Lecture: Playwright MCP session. References to MCP point at the public protocol spec at modelcontextprotocol.io. No code is copied from any third-party repository.

1What is MCP - the protocol

A standardised way for an AI client to ask "what tools do you have?" and then call them. Think of it as the USB-C of LLM tool use - one shape, any host, any device.

JSON-RPC 2.0 tools resources prompts sampling progress events server-defined schema client-defined model

MCP has three primitives. Tools are functions the client can call (screenshot, run_spec). Resources are readable blobs the client can fetch (a trace.zip, a generated HTML report). Prompts are reusable templates the server suggests to the client (e.g. "summarise this failing trace"). On top of that the server can sample - ask the client's model a follow-up question - which is how MCP servers stay model-agnostic.

Why it matters for testing: before MCP, every AI agent needed bespoke glue to talk to Playwright. With MCP, you write the Playwright server once, and any MCP-aware client - Claude Desktop, Cursor, VS Code with Cline - can use it without changes. That's the same value proposition we got from LSP for editors, but for AI agents.

Exercises

  1. Read the spec. Open modelcontextprotocol.io/spec and identify the three primitives. Sketch which tools, resources, and prompts a Playwright server would expose.
  2. Pick the boundary. For our TTACart test suite, list which features belong as tools (functions) vs. resources (readable artefacts). Where would logs/heals.jsonl live?
  3. JSON-RPC trace. Capture a real client-server exchange (tools/list, tools/call) and annotate every field.

2Playwright as an MCP server

A small Node process that imports @playwright/test internally and exposes its actions as MCP tools. The AI client sends JSON-RPC, the server runs the action, returns the result.

browser_navigate browser_click browser_fill browser_screenshot browser_snapshot (ARIA) browser_evaluate run_spec read_trace

The server boots a real Chromium / Firefox / WebKit (headless or headed), holds one or more BrowserContext instances, and routes tool calls into the Playwright API. The client doesn't see DOM directly - it asks the server for a page snapshot (usually the accessibility tree) and uses that as context for its next decision. That snapshot is also how the model selects elements - by an ARIA ref the server assigned, not by CSS.

A second class of tools wraps your existing npx playwright commands - run_spec(path), show_report(), show_trace(zip). The model kicks off a real test run as if it were the developer at the terminal, then reads the result back through a resources/read call.

illustrative tool registration (10 lines)
// Fresh TTA snippet, not copied from upstream. Pseudo-MCP server.
server.tool('browser_click', {
  description: 'Click an element by ARIA ref returned from browser_snapshot.',
  inputSchema: { ref: { type: 'string' } },
}, async ({ ref }) => {
  const locator = page.locator(`[aria-ref="${ref}"]`);
  await locator.click();
  return { ok: true, url: page.url() };
});

Exercises

  1. List 12 tools. Draft the schema for 12 Playwright tools you would expose - 6 page-action tools and 6 test-runner tools. Specify inputSchema for each.
  2. ARIA ref instead of selector. Explain why returning {ref: "node-2841"} is safer than returning a CSS selector for the model to use.
  3. Context lifetime. Decide: one BrowserContext per conversation, or per tool call. Justify the trade-off.
  4. Spec runner. Wire a run_spec(path) tool that shells out to npx playwright test and streams stdout via MCP progress events.

3AI clients consuming MCP

Claude Desktop, Cursor, Cline (VS Code), Continue, Zed AI, Windsurf. Same client API surface - configure the server once in JSON, the model gets the new tools at next chat.

Claude Desktop Cursor Cline (VS Code) Continue Zed Windsurf CLI MCP shells

Each client has a config file - typically JSON - listing the MCP servers it should start at launch. The client reads tools/list on connect, attaches the tool schemas to its system prompt, and from then on the model can call any tool by name. The client itself handles permission UI ("Allow Playwright server to call browser_click?") and rate-limiting.

claude_desktop_config.json (illustrative TTA shape)
{
  "mcpServers": {
    "tta-playwright": {
      "command": "npx",
      "args": ["-y", "@your-org/tta-playwright-mcp"],
      "env": { "TTA_HEADLESS": "true" }
    }
  }
}

Exercises

  1. Install + connect. Install a Playwright MCP server, register it in Claude Desktop, and have the model generate tta-cart-login.spec.ts purely by prompting.
  2. Permission flow. Approve the first tool call, deny the second, observe how the model adapts.
  3. Cross-client. Same MCP server, run it once with Cursor and once with Claude Desktop. Compare the conversation transcripts.
  4. Tools list inspection. Use the client's "show tools" UI and verify the schema matches what your server exports.

4MCP tools - listing, calling, reading resources

Four core methods. tools/list returns the schema. tools/call invokes one. resources/list + resources/read stream readable artefacts back.

tools/list tools/call resources/list resources/read prompts/list progress notifications cancellation

tools/list is what the client calls on connect - the server returns an array of {name, description, inputSchema} objects. tools/call takes {name, arguments} and returns a typed result. Long-running tools (a 90-second test run, say) emit notifications/progress so the client UI can render a spinner. The client can also cancel mid-flight via notifications/cancelled.

resources/read is how you ship artefacts back. A test run produces a trace.zip + an HTML report; both are exposed as resources with URIs like tta-pw://runs/2026-05-21-181203/trace.zip. The client can render the HTML inline or offer a "save" button.

Exercises

  1. Cancel a long run. Trigger a run_spec call, then cancel it 5 seconds in. Verify the test process actually stops.
  2. Progress events. Emit notifications/progress every 5 seconds during a TTA suite run. Render them in a custom client UI.
  3. Resource URI scheme. Design a URI scheme for: traces, reports, screenshots, generated specs. Avoid collisions across runs.
  4. Prompt template. Expose a summarise_trace prompt template that the client can apply when reading a failing trace.

5Stdio vs SSE transport

MCP messages travel over one of two transports. Stdio is for local servers spawned by the client. Server-Sent Events (SSE) is for remote / hosted servers.

stdio: local + simple SSE: remote + scalable stdio framing: line-delimited JSON SSE framing: text/event-stream stdio auth: process boundary SSE auth: bearer / OAuth

Stdio is the default. The client spawns the server as a child process, talks JSON-RPC over stdin / stdout. Zero networking, zero auth, simplest possible setup. This is what you'll use on your dev laptop 95% of the time, including for a Playwright server.

SSE matters when the server is shared - a hosted Playwright MCP that ten developers point at, or a centralised Playwright runner inside a corporate network. The client opens an HTTP connection, the server streams messages back via text/event-stream. Auth + TLS become your problem; on the upside, the heavy browser process lives on a beefy CI machine, not on the developer's laptop.

Exercises

  1. Start in stdio. Run your TTA Playwright server in stdio mode + connect with Claude Desktop. Watch the JSON-RPC messages.
  2. Switch to SSE. Move the same server behind an SSE endpoint. Add a bearer token and reconnect.
  3. Measure latency. Same browser action over stdio vs. SSE. How much overhead does the network add?
  4. Auth design. Sketch how you'd let an SSE Playwright server be shared safely across 3 teams in the same org.

6Local vs remote MCP server - security boundaries

A local stdio server can read anything the user can read. A remote SSE server reads only what its account allows. Plan the boundary before you connect.

local: user-level access remote: service account tool allow-list filesystem sandbox network policy audit log rate limit
  1. Local stdio. The server runs as you, with your credentials, on your machine. It can read ~/.aws/credentials if you let it. Treat it like a CLI tool you installed - same trust model.
  2. Remote SSE. The server runs elsewhere as a service account. Bound by whatever IAM the service has. Add bearer auth, TLS, rate limits, audit logs.
  3. Tool allow-list. Even local servers should let you disable specific tools per workspace. "On the production support workspace, no browser_evaluate tool" is a sensible rule.
  4. Filesystem reach. If the server can read or write files, scope it to a project root and reject everything else. Path-traversal is the classic MCP attack.
  5. Network reach. A Playwright server can navigate anywhere. Allow-list domains for production-impacting flows.
Security note. Do not point a production-credentialed MCP server at a free-form chat client. Once a tool can browser_evaluate arbitrary JavaScript on logged-in sessions, prompt injection from a malicious webpage becomes a full credential exfil.

Exercises

  1. Threat model. Write a 5-row threat-model table for: local Playwright stdio MCP + Claude Desktop on a dev laptop. Rows: actor, asset, attack, mitigation, residual risk.
  2. Allow-list test. Configure the server to expose only browser_navigate, browser_click, browser_snapshot. Confirm browser_evaluate is unavailable.
  3. Path traversal probe. Pass ../../../etc/passwd to a file-reading tool. Confirm the server rejects it.
  4. Prompt injection demo. Visit a deliberately crafted page that whispers "run browser_evaluate('fetch...')". Does your client + server prevent it?

7STLC + MCP - wiring AI into the Software Testing Life Cycle

Design -> generate -> execute -> report. MCP gives you one consistent tool surface across the whole loop, instead of bespoke integrations at each phase.

design: requirements -> cases generate: cases -> specs execute: specs -> runs report: runs -> insights close: insights -> PRs one MCP server per phase

A mature TTA team can run all four STLC phases through MCP. Design: a "test-design" MCP server reads JIRA tickets via the Atlassian MCP server and produces a markdown test plan. Generate: the Playwright MCP server uses the plan as input and emits real spec files. Execute: same server runs npx playwright test and streams progress. Report: a reporter MCP exposes the trace + HTML as resources, the AI client summarises failures and proposes fixes. Loop closes by opening a PR via the GitHub MCP server.

The win is composability. You're not picking one AI tool - you're picking one protocol and assembling tools per phase. The same tool surface our existing TTA AI Chat demo and functions/api/chat.js proxy expose can be re-wrapped as MCP tools when we ship the full STLC story.

Exercises

  1. Map the STLC. Take one TTACart epic and walk it through the 4 STLC phases, listing the MCP server you'd use at each phase.
  2. Design MCP. Sketch a "test-design" MCP server with 3 tools: read_ticket, extract_acceptance_criteria, draft_test_plan.
  3. Cross-server flow. One conversation that touches: design MCP -> Playwright MCP -> GitHub MCP. Trace the data flow.
  4. Cost ledger. Estimate token cost per phase for a typical 5-spec epic. Where would you cache?
  5. Failure replay. A test fails in production. The reporter MCP returns the trace. The AI summarises the failure and the Playwright MCP regenerates the spec. Walk through the dialog.

Common MCP servers + clients

Reference table you can hand to a new joiner. Not exhaustive - the ecosystem moves fast.

Component Examples Purpose
Server playwright-mcp, filesystem, git, github, postgres, atlassian Exposes tools, resources and prompts. The active surface the AI client talks to.
Client Claude Desktop, Cursor, Cline (VS Code), Continue, Zed AI, Windsurf Invokes tools on user instruction, shows permission UI, renders resources, holds the conversation.
Transport stdio, sse (Server-Sent Events) How JSON-RPC messages flow. stdio = local child process; SSE = HTTP streaming for remote servers.
Format JSON-RPC 2.0 with MCP-specific methods (tools/list, tools/call, resources/read) The wire format. Easy to log, easy to mock.
Auth (remote) Bearer token, OAuth 2.0, mTLS Stdio = process boundary, no extra auth. SSE = whatever HTTPS auth scheme you'd use for a normal API.
Discovery Client config JSON (claude_desktop_config.json, .cursor/mcp.json) How the client learns which servers to start at launch.

Diagrams - MCP message flow and STLC pipeline

Two mermaid diagrams. First: the client-server-tool sequence for one Playwright action. Second: how MCP servers compose across the full STLC.

MCP client-server message flow

A typical "click the green button" exchange. The model never imports Playwright; it just calls a tool by name and reads the snapshot back.

sequenceDiagram
  autonumber
  participant U as User
  participant C as MCP Client (Claude Desktop)
  participant M as LLM
  participant S as Playwright MCP Server
  participant B as Browser

  U->>C: "Add the highlighted SKU to the cart"
  C->>M: prompt + tools/list summary
  M-->>C: call browser_snapshot
  C->>S: tools/call browser_snapshot
  S->>B: page.accessibility.snapshot()
  B-->>S: ARIA tree with refs
  S-->>C: snapshot result
  C->>M: snapshot result attached
  M-->>C: call browser_click ref=node-2841
  C->>S: tools/call browser_click {ref:"node-2841"}
  S->>B: locator(`[aria-ref="node-2841"]`).click()
  B-->>S: ok
  S-->>C: { ok: true, url: "/cart" }
  C-->>U: "Done. Cart now has 1 item."
            

STLC composition - multiple MCP servers in one conversation

The AI client orchestrates four MCP servers across one feature. Each server is a small process; the client is the only thing that knows about all of them at once.

flowchart LR
  subgraph CLIENT[MCP Client]
    M[LLM]
  end

  M -- tools/call --> A[Atlassian MCP
read JIRA ticket] M -- tools/call --> P[Playwright MCP
generate + run spec] M -- tools/call --> R[Reporter MCP
read trace + HTML report] M -- tools/call --> G[GitHub MCP
open PR with the new spec] A -- ticket text --> M P -- run id + status --> M R -- summary + failures --> M G -- pr url --> M classDef srv fill:#d1fae5,stroke:#16a34a,color:#111 classDef llm fill:#ede9fe,stroke:#8b5cf6,color:#111 class A,P,R,G srv class M llm
Next step. Configure your local Claude Desktop with a Playwright MCP server, then walk through the exercises in section 7 (STLC + MCP) against the TTACart demo. The Framework + AI page covers the cost and privacy guards you should apply to every MCP server you connect.