V2 is the AI-extended sibling of V1 -
the same 10-box TTACart architecture, plus a new AI layer that handles failure root-cause analysis,
self-healing locators, prompt-to-spec generation, smart test data, and PR review. One multi-model
adapter sits in front of seven providers (DeepSeek default + Anthropic, OpenAI, Gemini, Mistral,
Ollama local, LM Studio local) and runs every cloud call through a serverless proxy with PII
redaction and a hard token budget.
Private previewV2 is in private preview - not yet wired into the practice sidebar. Pramod will
give the go-ahead before this lands publicly. Treat this page as a working draft, not a launch.
What's new in V2
Four short upgrades that turn the V1 framework into a learning + production tool that thinks for itself.
Box 11
AI layer
A dedicated src/ai/ module that hooks into Observability, Specs and Reports without touching the existing 10 boxes.
Adapter
Multi-model adapter
One AIClient fronts DeepSeek (default) + Anthropic, OpenAI, Gemini, Mistral, Ollama, LM Studio. Swap via env var.
Guards
Privacy + cost
Serverless proxy for cloud keys, PII redactor for DOM snapshots, per-run token budget, and a sha256-keyed response cache.
Branch
ttacart-ai
V2 of the TTACart suite ships on a dedicated ttacart-ai branch of the upstream Advance-Playwright-Framework repo.
Architecture - 11 boxes (V2: V1 + AI)
Same Page Object Model + fixture-based architecture as V1. The new 11th box is an AI layer that reads from box 5 Observability, writes into box 8 Reports, and feeds box 2 Specs. Cloud providers go through a serverless proxy; local providers (Ollama, LM Studio) skip the network entirely.
Framework code (TTACart-AI)
1. Page Objects - src/pages/
Unchanged from V1
LoginPage / InventoryPage / ItemDetailPage / CartPage
CheckoutStepOnePage / CheckoutStepTwoPage / CheckoutCompletePage
- Same locators, same methods
- Wrapped by UtilElementLocator which now has AI self-heal
2. Fixtures + Specs - src/fixtures/ + src/tests/
fixtures/test-base.ts
type Fix = {
loginPage: LoginPage;
inventoryPage: InventoryPage;
cartPage: CartPage;
checkoutOne: CheckoutStepOnePage;
checkoutTwo: CheckoutStepTwoPage;
// V2: an AI client fixture so any spec can call AIClient
ai: AIClient;
};
tests/checkout.ai.spec.ts
test('@e2e @ai-rca checkout with auto root-cause on fail', async ({
loginPage, inventoryPage, cartPage, checkoutOne, checkoutTwo, ai,
}) => {
// ...same flow as V1...
// AI hook auto-attaches a root-cause summary to the report on failure.
});
generateUser() // V1
generateCard() // V1
DataFactory.smart(prompt) // V2 - AI edge-case generator
e.g. smart('5 emails that fail RFC 5321')
cached by sha256(prompt + model)
4. Test data - src/testdata/
Unchanged + ai-cache/
users.json / products.csv / register.xlsx // V1
src/testdata/ai-cache/{sha256}.json // V2
- AI-generated edge cases are cached here
- committed so CI re-runs are deterministic
One module, 7 provider drivers behind a common interface.
- DeepSeek (default), Anthropic, OpenAI, Gemini, Mistral
- Ollama local, LM Studio local (NO key, NO network)
- Dispatch by model id prefix
- Hard budget cap + sha256 cache + PII redaction
Edges into the framework
box 5 Observability -> AI (reads trace + screenshots)
AI -> box 8 Reports (writes RCA summary, posts PR comment)
AI -> box 2 Specs (prompt-to-spec generator writes new .spec.ts)
Sub-features
1. Failure RCA2. Self-healing locators3. Prompt to spec4. Smart test data5. PR reviewer bot
Configuration & infrastructure
6. playwright.config.ts + envs
V2 adds env vars
TTA_AI_MODEL = deepseek-chat // default
TTA_AI_BUDGET = 50000 // tokens / run
TTA_AI_PROXY = /api/ai // serverless proxy
TTA_AI_REDACT = on // PII redactor on/off
DEEPSEEK_API_KEY / ANTHROPIC_API_KEY / OPENAI_API_KEY
GEMINI_API_KEY / MISTRAL_API_KEY // cloud
// Ollama + LM Studio need NO key
7. Test execution + tags (V2)
New tags
@ai-rca - failure RCA hook active for this spec
@ai-heal - self-heal allowed for this spec
@ai-data - DataFactory.smart() used
@ai-gen - generated by prompt-to-spec (review before merge)
8. Reports + artifacts (V2)
HTML + AI insights
playwright-report/index.html
- All V1 sections
- V2: ai-rca.md attached to every failed test
- V2: heals.jsonl summary block at top
- V2: token usage strip (run total + per-spec)
9. CI/CD + Docker + cloud grid (V2)
.github/workflows/ttacart-ai.yml
jobs:
test: runs sharded suite with TTA_AI_MODEL=deepseek-chat
review: runs scripts/ai-pr-review.ts on every PR
posts inline comments via gh pr review
budget: fails the build if total tokens > TTA_AI_BUDGET
Cloudflare Pages function.
- Cloud API keys live in env vars on the edge.
- Client code only ever sees /api/ai.
- Local providers bypass the proxy (direct localhost call).
10 boxes from V1 stay verbatim. The new 11th box (green) is the AI layer + its serverless proxy. Each numbered card below the diagram zooms into one of the top 5 AI features.
PR reviewer flow
sequenceDiagram
autonumber
participant Dev
participant GH as GitHub
participant GHA as ttacart-ai.yml (job: review)
participant AI as AIClient (DeepSeek)
participant Bot as gh pr review
Dev->>GH: open / update PR with *.spec.ts changes
GH->>GHA: workflow_run on pull_request
GHA->>GHA: gh pr diff -> capture spec hunks only
GHA->>AI: complete({ diff, ruleset, maxTokens: 1500 })
AI-->>GHA: findings JSON (rule, file, line, msg)
GHA->>Bot: gh pr review --comment with inline notes
Bot-->>Dev: review appears + checks status updates
Prompt to spec flow
graph LR
P["CLI: tta-ai gen 'log in standard_user, checkout 1 backpack'"] --> SYS[system prompt + POM index]
SYS --> AI[AIClient.complete - 1200 tokens]
AI --> RAW[draft TypeScript]
RAW --> LINT[eslint + tsc --noEmit]
LINT -- pass --> WRITE[write src/tests/generated/*.spec.ts]
LINT -- fail --> AI
WRITE --> RUN[npx playwright test --list]
classDef io fill:#a5d8ff,stroke:#2563eb,color:#111
classDef ai fill:#ede9fe,stroke:#8b5cf6,color:#111
classDef qa fill:#fef3c7,stroke:#f59e0b,color:#111
classDef ok fill:#d1fae5,stroke:#16a34a,color:#111
class P,RUN io
class SYS,AI,RAW ai
class LINT qa
class WRITE ok
Smart data flow with cache
graph LR
P[DataFactory.smart prompt] --> H[sha256 model + system + prompt]
H --> C{".cache/ai hit?"}
C -- yes --> ROWS[return fixture rows]
C -- no --> AI[AIClient.complete - 800 tokens]
AI --> J[JSON.parse + zod validate]
J -- pass --> WR[write .cache/ai + return]
J -- fail --> AI
WR --> ROWS
classDef io fill:#a5d8ff,stroke:#2563eb,color:#111
classDef cache fill:#fef9c3,stroke:#d97706,color:#111
classDef ai fill:#ede9fe,stroke:#8b5cf6,color:#111
classDef ok fill:#d1fae5,stroke:#16a34a,color:#111
class P,ROWS io
class C,WR cache
class AI,J ai
Top 5 AI features
One card per feature. Each card lists the value prop, which box it plugs into, an integration code sample, the default model + override, and the cost guard that stops runaway spend.
1
Failure RCA - auto root-cause on every failed test
Plugs into Box 5 + 8
Turn a 5MB trace.zip into a two-sentence root cause attached to the HTML report. CI failures get the cause posted as a PR comment in under 8 seconds.
Where: box 5 Observability -> AI -> box 8 Reports. Hook lives in CustomTTAReporter.onTestEnd.
Failure RCA flow
sequenceDiagram
autonumber
participant T as Failing test
participant R as CustomTTAReporter
participant TS as summariseTrace
participant AI as AIClient (DeepSeek default)
participant H as tta-report/index.html
participant PR as GitHub PR
T--xR: status = 'failed' + trace.zip path
R->>TS: read last 50 events + DOM at failure + network errors
TS-->>R: redacted 50-line summary
R->>AI: complete({ prompt, maxTokens: 300 })
AI-->>R: 2-sentence cause + suggested fix
R->>H: attach RCA panel to failing test row
alt running in CI
R->>PR: gh pr comment with cause + fix
end
src/ai/failure-rca.ts
import { AIClient } from './AIClient';
import { summariseTrace } from './trace-summary';
export async function explainFailure(
traceZipPath: string,
opts: { model?: string; maxTokens?: number } = {},
) {
// last 50 events + DOM at failure + last network errors
const trace = await summariseTrace(traceZipPath);
const prompt =
'A Playwright test failed. Trace summary follows.\n\n' +
trace +
'\n\nReturn: 1) a 2-sentence root cause, 2) the single most likely fix.';
return AIClient.complete({
model: opts.model ?? process.env.TTA_AI_MODEL ?? 'deepseek-chat',
prompt,
maxTokens: opts.maxTokens ?? 300,
});
}
Cost guard: capped at maxTokens=300 per failure + only fires on status==='failed'. Worst case (every test fails) the budget hard-stop in AIClient still wins.
2
Self-healing locators - try the AI fallback before failing
Plugs into Box 3
When click(locator) hits not visible or not found, ask AI for 3 fallback locators using the current DOM (PII-redacted), retry with the highest-confidence one, and log the heal.
Where: box 3 Utils - UtilElementLocator wraps Locator.click / fill / type. Heals are written to logs/heals.jsonl so the team can bake a real PR later.
Self-healing locator flow
flowchart LR
A[primary locator] --> B{matches in DOM?}
B -- yes --> Z[click / fill / type]
B -- no, after 5s --> RD[PII redactor strips email / phone / postal]
RD --> AI[AIClient.complete - 200 tokens]
AI --> S[3 candidate locators, ranked by stability]
S --> R[retry with #1]
R -- found --> Z
R -- still not found --> LOG[append to logs/heals.jsonl + throw]
classDef good fill:#d1fae5,stroke:#16a34a,color:#111
classDef warn fill:#fef3c7,stroke:#f59e0b,color:#111
classDef ai fill:#ede9fe,stroke:#8b5cf6,color:#111
classDef bad fill:#ffc9c9,stroke:#ef4444,color:#111
class Z good
class RD warn
class AI,S ai
class LOG bad
Cost guard: DOM snapshot capped at 32KB after the PII redactor strips emails / phones / postal codes. Heals are sha256-cached so the second occurrence of the same broken selector reuses the answer for free.
3
Prompt to spec - generate runnable .spec.ts from English
Plugs into Box 2
A junior tester types a sentence. Out comes a Playwright spec that imports the right page object, uses the right fixtures, and runs first try.
Where: box 2 Specs. CLI script scripts/ai-gen-spec.ts writes to src/tests/generated/ and tags with @ai-gen.
src/ai/spec-gen.ts
import { AIClient } from './AIClient';
const SYSTEM = `You write Playwright TypeScript specs.
Target pages live at /playwright/ttacart/*.html.
Use the existing Page Objects in src/pages/ttacart/:
LoginPage, InventoryPage, ItemDetailPage, CartPage,
CheckoutStepOnePage, CheckoutStepTwoPage, CheckoutCompletePage.
Use the test-base fixture (import { test, expect } from '../fixtures/test-base').
No raw page.click outside POMs. No waitForTimeout. Use expect().toHaveX.`;
export async function generateSpec(prompt: string) {
const out = await AIClient.complete({
model: process.env.TTA_AI_MODEL ?? 'deepseek-chat',
system: SYSTEM,
prompt,
maxTokens: 1200,
});
return out.trim();
}
// CLI:
// npx tsx scripts/ai-gen-spec.ts \
// "log in as standard_user and complete checkout with 1 backpack"
Cost guard:maxTokens=1200 + every generated spec is force-tagged @ai-gen so PR review can grep them and require human approval before merge.
4
Smart test data - the edge cases Faker won't give you
Plugs into Box 3 + 4
Faker gives you valid names and emails. DataFactory.smart(prompt) gives you 5 emails that should fail RFC 5321, 10 unicode postal codes that break payment forms, or 3 SQL-shaped strings that should be rejected.
Where: box 3 Utils (DataFactory) + box 4 Test data (ai-cache/). Cache key: sha256(prompt + model + schema).
Cost guard: the sha256 cache means a re-run of the same spec is zero tokens. zod schema validation rejects malformed answers so a bad model never burns budget on retries.
A GitHub Action reads the diff of every **/*.spec.ts, asks AI to flag waitForTimeout, absolute XPath, missing expect, raw page.click outside POMs, and posts inline review comments via gh pr review.
Cost guard: the diff is capped at 8KB (truncated head + tail) so a huge PR can't blow the budget. The reviewer only runs on changed .spec.ts files, never on the whole repo.
Multi-model adapter - one interface, seven providers
One TypeScript module, seven provider drivers behind a common interface. Switch model by setting process.env.TTA_AI_MODEL. Default: deepseek-chat. Local providers (Ollama, LM Studio) need no API key - the data never leaves the machine.
Provider
Model id
Auth
Endpoint
Stream?
Where
DeepSeek
deepseek-chat
DEEPSEEK_API_KEY
api.deepseek.com
Yes
Default
Anthropic
claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY
api.anthropic.com
Yes
Cloud
OpenAI
gpt-4o-mini
OPENAI_API_KEY
api.openai.com
Yes
Cloud
Gemini
gemini-1.5-flash
GEMINI_API_KEY
generativelanguage.googleapis.com
Yes
Cloud
Mistral
mistral-large-latest
MISTRAL_API_KEY
api.mistral.ai
Yes
Cloud
Ollama
ollama:llama3.1
none
localhost:11434
Yes
Local
LM Studio
lmstudio:any
none
localhost:1234/v1
Yes
Local
AIClient interface + dispatch
src/ai/AIClient.ts
export interface AIRequest {
model: string; // e.g. 'deepseek-chat' | 'claude-3-5-sonnet-20241022'
prompt: string;
system?: string;
maxTokens?: number;
}
export interface AIResult {
text: string;
tokens: { in: number; out: number };
cached: boolean;
}
export const AIClient = {
async complete(req: AIRequest): Promise<AIResult> {
Budget.assertWithin(req.maxTokens ?? 400);
const key = await sha256(req.model + (req.system ?? '') + req.prompt);
const cached = await cacheGet(key);
if (cached) return { ...cached, cached: true };
const driver = dispatch(req.model); // see below
const out = await driver.complete(req);
Budget.record(out.tokens);
await cachePut(key, out);
return { ...out, cached: false };
},
// Convenience wrappers (completeJSON, locators, ...) call complete() under the hood.
};
function dispatch(model: string) {
if (model.startsWith('ollama:')) return drivers.ollama;
if (model.startsWith('lmstudio:')) return drivers.lmstudio;
if (model.startsWith('claude-')) return drivers.anthropic;
if (model.startsWith('gpt-')) return drivers.openai;
if (model.startsWith('gemini-')) return drivers.gemini;
if (model.startsWith('mistral-')) return drivers.mistral;
if (model.startsWith('deepseek-')) return drivers.deepseek;
throw new Error('Unknown TTA_AI_MODEL: ' + model);
}
Privacy + cost guards
Five guards that stop V2 from leaking data or burning the AWS bill. Every one of them is enforced by AIClient itself, not the upstream provider.
1. Serverless proxy for cloud calls. All cloud requests go through a Cloudflare Pages function (functions/api/ai.js), mirroring the existing /api/chat proxy. Cloud keys live in edge env vars and never reach the client bundle.
2. PII redactor before any cloud call. DOM snapshots, trace summaries and console logs go through a regex pass that strips emails, phone numbers, and postal codes before the request leaves the machine. Local providers (Ollama, LM Studio) skip the redactor - the data never leaves the machine in the first place.
3. Per-CI-run token budget cap.AIClient tracks tokens-in + tokens-out and hard-stops at 50_000 by default. Override via TTA_AI_BUDGET. CI workflow budget job fails the build if the run exceeded its cap.
4. sha256 response cache. Cache key: sha256(model + system + prompt). Re-running the same spec doesn't double-charge. Cache lives in .ai-cache/ for runtime + src/testdata/ai-cache/ for committed deterministic test data.
5. BYO key drawer in the TTACart-AI UI. Mirrors the pattern from /playwright/ai-chat/. Students paste their own key into a drawer; it lives in localStorage; the proxy reads it from a header for that session only.
Diagrams (rendered client-side)
Three mermaid diagrams - the 11-box architecture, the failure RCA sequence, and the model routing switch.
1. V2 architecture - 10 boxes + AI
V1 boxes 1-10 stay verbatim. Box 11 (AI) is the new node. Dotted edges show the AI hooks into Observability, Specs and Reports.
graph LR
classDef code fill:#fde7f3,stroke:#db2777,color:#111
classDef data fill:#d1fae5,stroke:#16a34a,color:#111
classDef obs fill:#cffafe,stroke:#0891b2,color:#111
classDef cfg fill:#ede9fe,stroke:#8b5cf6,color:#111
classDef ci fill:#ffedd5,stroke:#f97316,color:#111
classDef qty fill:#fef9c3,stroke:#d97706,color:#111
classDef ai fill:#dcfce7,stroke:#22c55e,color:#111,stroke-width:3px
P[1. Page Objects]:::code --> F[2. Fixtures + Specs]:::code
P --> U[3. Utils]:::code
F --> D[4. Test data]:::data
U --> O[5. Observability]:::obs
D --> O
C[6. playwright.config.ts]:::cfg --> E[7. Test execution + tags]:::cfg
C --> R[8. Reports + artifacts]:::cfg
E --> CI[9. CI/CD + Docker + cloud]:::ci
R --> CI
CI --> Q[10. Quality + package + VCS]:::qty
AI[11. AI layer AIClient + 7 providers]:::ai
O -. AI hooks .-> AI
AI -. RCA summary .-> R
AI -. prompt to spec .-> F
2. Failure RCA flow
A failed test triggers the reporter hook, which calls failure-rca.ts, which calls AIClient, which dispatches to the configured provider, which returns a 2-sentence root cause that gets attached to the report and (in CI) posted as a PR comment.
sequenceDiagram
autonumber
participant T as Test
participant R as Reporter (onTestEnd)
participant FR as failure-rca.ts
participant AC as AIClient
participant DS as DeepSeek (default)
participant H as HTML report
participant PR as gh pr review
T->>R: status=failed + trace.zip
R->>FR: explainFailure(trace.zip)
FR->>FR: summariseTrace + PII redact
FR->>AC: complete({ model, prompt, maxTokens=300 })
AC->>AC: budget check + cache lookup
AC->>DS: POST /chat/completions
DS-->>AC: 2-sentence root cause
AC-->>FR: AIResult { text, tokens }
FR-->>R: ai-rca.md
R-->>H: attach ai-rca.md
R-->>PR: post inline comment (CI only)
3. Model routing
AIClient.complete(req) looks at the req.model prefix and dispatches to one of seven drivers. Cloud calls go through the serverless proxy; local providers (Ollama, LM Studio) hit localhost directly.
graph LR
REQ[AIClient.complete req.model = ?] --> SW{prefix?}
SW -- deepseek- --> D1[DeepSeek api.deepseek.com]
SW -- claude- --> D2[Anthropic api.anthropic.com]
SW -- gpt- --> D3[OpenAI api.openai.com]
SW -- gemini- --> D4[Gemini generativelanguage.googleapis.com]
SW -- mistral- --> D5[Mistral api.mistral.ai]
SW -- ollama: --> D6[Ollama LOCAL localhost:11434]
SW -- lmstudio: --> D7[LM Studio LOCAL localhost:1234/v1]
D1 --> PX[Serverless proxy /api/ai]
D2 --> PX
D3 --> PX
D4 --> PX
D5 --> PX
classDef cloud fill:#dbeafe,stroke:#2563eb,color:#111
classDef local fill:#dcfce7,stroke:#22c55e,color:#111,stroke-width:2px
classDef proxy fill:#fef9c3,stroke:#d97706,color:#111
class D1,D2,D3,D4,D5 cloud
class D6,D7 local
class PX proxy
V1 vs V2 - side-by-side
Same framework spine. V2 adds an AI layer, a multi-model adapter, and the guards that keep it cheap and safe.
Concern
V1 (current)
V2 + AI (this page)
Locators
Hand-written CSS / role-based locators in POMs. Broken locator = test fails.
Same hand-written locators. Self-heal tries 3 AI fallbacks before failing; chosen heal is logged for human review.
Failure analysis
Open trace.zip in Trace Viewer. Read 5 MB of events.
2-sentence AI root cause attached to the HTML report + posted as PR comment in CI.
Spec authoring
Write by hand. Pair-program with a senior.
npm run ai:gen -- "prompt" writes a runnable spec; tagged @ai-gen for human approval before merge.
Reporting
HTML + Allure + custom tta-report/.
Same + AI RCA per failed test, heals.jsonl summary, run-level token usage strip.
CI
Sharded 4-way Playwright workflow.
Same sharding + review job (PR reviewer) + budget job (fails on token overspend).
Cost
Zero LLM cost.
Hard budget cap (TTA_AI_BUDGET=50000 default) + sha256 cache + small per-call maxTokens. Local providers cost zero.
Provider lock-in
None.
None. TTA_AI_MODEL swaps between 7 providers. Local (Ollama, LM Studio) means data never leaves the machine.
Roadmap - V2.x and beyond
Six features queued behind the V2 launch. Each one extends the same AI layer; no new boxes.
Feature
What it does
Status
AI test impact analysis
Read the git diff, predict the impacted specs, and run only those on a PR. Full suite still runs on main.
Planned
AI flaky-test classifier
Cluster recent failures by error signature; tag obvious flakes as @flaky and quarantine them automatically.
Planned
Visual diff triage
When toHaveScreenshot fails, ask AI whether the diff is real or just antialiasing / font-fallback noise.
Research
Dead-test detector
Cross-reference test names against changelogs to find specs that haven't asserted on real behaviour in 6 months.
Research
AI a11y reviewer
Run axe-core in every spec; pipe violations into AI for a plain-English remediation summary.
Pilot
AI run summary
End-of-run hook posts a single paragraph "what happened" to Slack + email - failures, flakes, slowest specs, top heals.
Pilot
V2 preview. Not in the sidebar yet. Open the V1 page for the published version, or jump back to the Practice overview.