V2 . Playwright + AI
V2 . Playwright + AI

Advance Playwright framework + AI

V2 is the AI-extended sibling of V1 - the same 10-box TTACart architecture, plus a new AI layer that handles failure root-cause analysis, self-healing locators, prompt-to-spec generation, smart test data, and PR review. One multi-model adapter sits in front of seven providers (DeepSeek default + Anthropic, OpenAI, Gemini, Mistral, Ollama local, LM Studio local) and runs every cloud call through a serverless proxy with PII redaction and a hard token budget.

Private preview V2 is in private preview - not yet wired into the practice sidebar. Pramod will give the go-ahead before this lands publicly. Treat this page as a working draft, not a launch.

What's new in V2

Four short upgrades that turn the V1 framework into a learning + production tool that thinks for itself.

Box 11

AI layer

A dedicated src/ai/ module that hooks into Observability, Specs and Reports without touching the existing 10 boxes.

Adapter

Multi-model adapter

One AIClient fronts DeepSeek (default) + Anthropic, OpenAI, Gemini, Mistral, Ollama, LM Studio. Swap via env var.

Guards

Privacy + cost

Serverless proxy for cloud keys, PII redactor for DOM snapshots, per-run token budget, and a sha256-keyed response cache.

Branch

ttacart-ai

V2 of the TTACart suite ships on a dedicated ttacart-ai branch of the upstream Advance-Playwright-Framework repo.

Architecture - 11 boxes (V2: V1 + AI)

Same Page Object Model + fixture-based architecture as V1. The new 11th box is an AI layer that reads from box 5 Observability, writes into box 8 Reports, and feeds box 2 Specs. Cloud providers go through a serverless proxy; local providers (Ollama, LM Studio) skip the network entirely.

Framework code (TTACart-AI)

1. Page Objects - src/pages/

Unchanged from V1
LoginPage / InventoryPage / ItemDetailPage / CartPage
CheckoutStepOnePage / CheckoutStepTwoPage / CheckoutCompletePage
- Same locators, same methods
- Wrapped by UtilElementLocator which now has AI self-heal

2. Fixtures + Specs - src/fixtures/ + src/tests/

fixtures/test-base.ts
type Fix = {
  loginPage: LoginPage;
  inventoryPage: InventoryPage;
  cartPage: CartPage;
  checkoutOne: CheckoutStepOnePage;
  checkoutTwo: CheckoutStepTwoPage;
  // V2: an AI client fixture so any spec can call AIClient
  ai: AIClient;
};
tests/checkout.ai.spec.ts
test('@e2e @ai-rca checkout with auto root-cause on fail', async ({
  loginPage, inventoryPage, cartPage, checkoutOne, checkoutTwo, ai,
}) => {
  // ...same flow as V1...
  // AI hook auto-attaches a root-cause summary to the report on failure.
});

3. Utils / Helpers - src/utils/

UtilElementLocator.ts (V2)
// V1 methods kept verbatim.
// V2 adds:
- async click(locator, { healOnFail = true } = {})
- async fill(locator, value, { healOnFail = true } = {})
- on not-found / not-visible -> calls heal()
- heal() asks AIClient for 3 fallback locators
- writes the chosen heal to logs/heals.jsonl
DataFactory.ts (V2)
generateUser()              // V1
generateCard()              // V1
DataFactory.smart(prompt)   // V2 - AI edge-case generator
  e.g. smart('5 emails that fail RFC 5321')
  cached by sha256(prompt + model)

4. Test data - src/testdata/

Unchanged + ai-cache/
users.json / products.csv / register.xlsx  // V1
src/testdata/ai-cache/{sha256}.json        // V2
  - AI-generated edge cases are cached here
  - committed so CI re-runs are deterministic

5. Observability - logs/ + tta-report/

logs/ (V2 adds)
logs/run-{ISO}.log         - V1
logs/network/{spec}.har    - V1
logs/console/{spec}.txt    - V1
logs/heals.jsonl           - V2 (every self-heal)
logs/ai-calls.jsonl        - V2 (every AIClient call + tokens)
CustomTTAReporter.ts (V2)
onTestEnd(result):
  if (result.status === 'failed') {
    summary = await failureRCA(result.attachments.trace);
    result.attachments.push({ name: 'ai-rca.md', body: summary });
  }

11. AI layer - src/ai/ (NEW in V2)

AIClient adapter
One module, 7 provider drivers behind a common interface.
- DeepSeek (default), Anthropic, OpenAI, Gemini, Mistral
- Ollama local, LM Studio local (NO key, NO network)
- Dispatch by model id prefix
- Hard budget cap + sha256 cache + PII redaction
Edges into the framework
box 5 Observability -> AI (reads trace + screenshots)
AI -> box 8 Reports (writes RCA summary, posts PR comment)
AI -> box 2 Specs   (prompt-to-spec generator writes new .spec.ts)
Sub-features
1. Failure RCA 2. Self-healing locators 3. Prompt to spec 4. Smart test data 5. PR reviewer bot
Configuration & infrastructure

6. playwright.config.ts + envs

V2 adds env vars
TTA_AI_MODEL    = deepseek-chat        // default
TTA_AI_BUDGET   = 50000                // tokens / run
TTA_AI_PROXY    = /api/ai              // serverless proxy
TTA_AI_REDACT   = on                   // PII redactor on/off
DEEPSEEK_API_KEY / ANTHROPIC_API_KEY / OPENAI_API_KEY
GEMINI_API_KEY  / MISTRAL_API_KEY      // cloud
// Ollama + LM Studio need NO key

7. Test execution + tags (V2)

New tags
@ai-rca     - failure RCA hook active for this spec
@ai-heal    - self-heal allowed for this spec
@ai-data    - DataFactory.smart() used
@ai-gen     - generated by prompt-to-spec (review before merge)

8. Reports + artifacts (V2)

HTML + AI insights
playwright-report/index.html
  - All V1 sections
  - V2: ai-rca.md attached to every failed test
  - V2: heals.jsonl summary block at top
  - V2: token usage strip (run total + per-spec)

9. CI/CD + Docker + cloud grid (V2)

.github/workflows/ttacart-ai.yml
jobs:
  test:    runs sharded suite with TTA_AI_MODEL=deepseek-chat
  review:  runs scripts/ai-pr-review.ts on every PR
           posts inline comments via gh pr review
  budget:  fails the build if total tokens > TTA_AI_BUDGET

10. Quality + package + VCS (V2)

V2 adds
"ai:gen"         : "tsx scripts/ai-gen-spec.ts",
"ai:review"      : "tsx scripts/ai-pr-review.ts",
"ai:budget"      : "tsx scripts/ai-budget-check.ts",
deps: undici (proxy fetch), zod (LLM JSON schema validation)
.gitignore: logs/ai-calls.jsonl, .ai-cache/runtime/

Serverless proxy - functions/api/ai.js

Mirrors /api/chat pattern
Cloudflare Pages function.
- Cloud API keys live in env vars on the edge.
- Client code only ever sees /api/ai.
- Local providers bypass the proxy (direct localhost call).

10 boxes from V1 stay verbatim. The new 11th box (green) is the AI layer + its serverless proxy. Each numbered card below the diagram zooms into one of the top 5 AI features.

PR reviewer flow
sequenceDiagram
  autonumber
  participant Dev
  participant GH as GitHub
  participant GHA as ttacart-ai.yml (job: review)
  participant AI as AIClient (DeepSeek)
  participant Bot as gh pr review

  Dev->>GH: open / update PR with *.spec.ts changes
  GH->>GHA: workflow_run on pull_request
  GHA->>GHA: gh pr diff -> capture spec hunks only
  GHA->>AI: complete({ diff, ruleset, maxTokens: 1500 })
  AI-->>GHA: findings JSON (rule, file, line, msg)
  GHA->>Bot: gh pr review --comment with inline notes
  Bot-->>Dev: review appears + checks status updates
              
Prompt to spec flow
graph LR
  P["CLI: tta-ai gen 'log in standard_user, checkout 1 backpack'"] --> SYS[system prompt + POM index]
  SYS --> AI[AIClient.complete - 1200 tokens]
  AI --> RAW[draft TypeScript]
  RAW --> LINT[eslint + tsc --noEmit]
  LINT -- pass --> WRITE[write src/tests/generated/*.spec.ts]
  LINT -- fail --> AI
  WRITE --> RUN[npx playwright test --list]

  classDef io fill:#a5d8ff,stroke:#2563eb,color:#111
  classDef ai fill:#ede9fe,stroke:#8b5cf6,color:#111
  classDef qa fill:#fef3c7,stroke:#f59e0b,color:#111
  classDef ok fill:#d1fae5,stroke:#16a34a,color:#111
  class P,RUN io
  class SYS,AI,RAW ai
  class LINT qa
  class WRITE ok
              
Smart data flow with cache
graph LR
  P[DataFactory.smart prompt] --> H[sha256 model + system + prompt]
  H --> C{".cache/ai hit?"}
  C -- yes --> ROWS[return fixture rows]
  C -- no --> AI[AIClient.complete - 800 tokens]
  AI --> J[JSON.parse + zod validate]
  J -- pass --> WR[write .cache/ai + return]
  J -- fail --> AI
  WR --> ROWS

  classDef io fill:#a5d8ff,stroke:#2563eb,color:#111
  classDef cache fill:#fef9c3,stroke:#d97706,color:#111
  classDef ai fill:#ede9fe,stroke:#8b5cf6,color:#111
  classDef ok fill:#d1fae5,stroke:#16a34a,color:#111
  class P,ROWS io
  class C,WR cache
  class AI,J ai
              

Top 5 AI features

One card per feature. Each card lists the value prop, which box it plugs into, an integration code sample, the default model + override, and the cost guard that stops runaway spend.

1

Failure RCA - auto root-cause on every failed test

Plugs into Box 5 + 8

Turn a 5MB trace.zip into a two-sentence root cause attached to the HTML report. CI failures get the cause posted as a PR comment in under 8 seconds.

Where: box 5 Observability -> AI -> box 8 Reports. Hook lives in CustomTTAReporter.onTestEnd.

Failure RCA flow
sequenceDiagram
  autonumber
  participant T as Failing test
  participant R as CustomTTAReporter
  participant TS as summariseTrace
  participant AI as AIClient (DeepSeek default)
  participant H as tta-report/index.html
  participant PR as GitHub PR

  T--xR: status = 'failed' + trace.zip path
  R->>TS: read last 50 events + DOM at failure + network errors
  TS-->>R: redacted 50-line summary
  R->>AI: complete({ prompt, maxTokens: 300 })
  AI-->>R: 2-sentence cause + suggested fix
  R->>H: attach RCA panel to failing test row
  alt running in CI
    R->>PR: gh pr comment with cause + fix
  end
              
src/ai/failure-rca.ts
import { AIClient } from './AIClient';
import { summariseTrace } from './trace-summary';

export async function explainFailure(
  traceZipPath: string,
  opts: { model?: string; maxTokens?: number } = {},
) {
  // last 50 events + DOM at failure + last network errors
  const trace = await summariseTrace(traceZipPath);
  const prompt =
    'A Playwright test failed. Trace summary follows.\n\n' +
    trace +
    '\n\nReturn: 1) a 2-sentence root cause, 2) the single most likely fix.';

  return AIClient.complete({
    model: opts.model ?? process.env.TTA_AI_MODEL ?? 'deepseek-chat',
    prompt,
    maxTokens: opts.maxTokens ?? 300,
  });
}
Default deepseek-chat Override claude-3-5-sonnet gpt-4o-mini ollama:llama3.1
Cost guard: capped at maxTokens=300 per failure + only fires on status==='failed'. Worst case (every test fails) the budget hard-stop in AIClient still wins.
2

Self-healing locators - try the AI fallback before failing

Plugs into Box 3

When click(locator) hits not visible or not found, ask AI for 3 fallback locators using the current DOM (PII-redacted), retry with the highest-confidence one, and log the heal.

Where: box 3 Utils - UtilElementLocator wraps Locator.click / fill / type. Heals are written to logs/heals.jsonl so the team can bake a real PR later.

Self-healing locator flow
flowchart LR
  A[primary locator] --> B{matches in DOM?}
  B -- yes --> Z[click / fill / type]
  B -- no, after 5s --> RD[PII redactor strips email / phone / postal]
  RD --> AI[AIClient.complete - 200 tokens]
  AI --> S[3 candidate locators, ranked by stability]
  S --> R[retry with #1]
  R -- found --> Z
  R -- still not found --> LOG[append to logs/heals.jsonl + throw]

  classDef good fill:#d1fae5,stroke:#16a34a,color:#111
  classDef warn fill:#fef3c7,stroke:#f59e0b,color:#111
  classDef ai   fill:#ede9fe,stroke:#8b5cf6,color:#111
  classDef bad  fill:#ffc9c9,stroke:#ef4444,color:#111
  class Z good
  class RD warn
  class AI,S ai
  class LOG bad
              
src/utils/UtilElementLocator.ts (excerpt)
async click(target: Flex, opts: { healOnFail?: boolean } = {}) {
  const heal = opts.healOnFail ?? true;
  const loc = typeof target === 'string' ? this.page.locator(target) : target;

  try {
    await loc.click({ timeout: 5_000 });
  } catch (err) {
    if (!heal) throw err;
    const dom = await this.snapshotDom();              // redacted
    const guesses = await AIClient.locators({
      intent: 'click',
      original: target.toString(),
      dom,
      n: 3,
    });
    for (const g of guesses) {
      try {
        await this.page.locator(g.selector).click({ timeout: 3_000 });
        await this.recordHeal({ from: target, to: g, confidence: g.score });
        return;
      } catch { /* try the next fallback */ }
    }
    throw err;
  }
}
Default deepseek-chat Override claude-3-5-sonnet gemini-1.5-flash lmstudio:any
Cost guard: DOM snapshot capped at 32KB after the PII redactor strips emails / phones / postal codes. Heals are sha256-cached so the second occurrence of the same broken selector reuses the answer for free.
3

Prompt to spec - generate runnable .spec.ts from English

Plugs into Box 2

A junior tester types a sentence. Out comes a Playwright spec that imports the right page object, uses the right fixtures, and runs first try.

Where: box 2 Specs. CLI script scripts/ai-gen-spec.ts writes to src/tests/generated/ and tags with @ai-gen.

src/ai/spec-gen.ts
import { AIClient } from './AIClient';

const SYSTEM = `You write Playwright TypeScript specs.
Target pages live at /playwright/ttacart/*.html.
Use the existing Page Objects in src/pages/ttacart/:
  LoginPage, InventoryPage, ItemDetailPage, CartPage,
  CheckoutStepOnePage, CheckoutStepTwoPage, CheckoutCompletePage.
Use the test-base fixture (import { test, expect } from '../fixtures/test-base').
No raw page.click outside POMs. No waitForTimeout. Use expect().toHaveX.`;

export async function generateSpec(prompt: string) {
  const out = await AIClient.complete({
    model: process.env.TTA_AI_MODEL ?? 'deepseek-chat',
    system: SYSTEM,
    prompt,
    maxTokens: 1200,
  });
  return out.trim();
}

// CLI:
//   npx tsx scripts/ai-gen-spec.ts \
//     "log in as standard_user and complete checkout with 1 backpack"
Default deepseek-chat Override claude-3-5-sonnet gpt-4o-mini mistral-large-latest
Cost guard: maxTokens=1200 + every generated spec is force-tagged @ai-gen so PR review can grep them and require human approval before merge.
4

Smart test data - the edge cases Faker won't give you

Plugs into Box 3 + 4

Faker gives you valid names and emails. DataFactory.smart(prompt) gives you 5 emails that should fail RFC 5321, 10 unicode postal codes that break payment forms, or 3 SQL-shaped strings that should be rejected.

Where: box 3 Utils (DataFactory) + box 4 Test data (ai-cache/). Cache key: sha256(prompt + model + schema).

src/utils/DataFactory.ts (excerpt)
import { z } from 'zod';
import { AIClient } from '../ai/AIClient';
import { cacheGet, cachePut } from '../ai/cache';

const EdgeCaseList = z.array(z.string()).max(20);

export const DataFactory = {
  // V1 helpers kept verbatim ...
  generateUser, generateCard, randomZip, pickProduct,

  // V2:
  async smart(prompt: string, schema = EdgeCaseList) {
    const key = await sha256(prompt + (process.env.TTA_AI_MODEL ?? 'deepseek-chat'));
    const hit = await cacheGet(key);
    if (hit) return schema.parse(hit);

    const raw = await AIClient.completeJSON({
      model: process.env.TTA_AI_MODEL ?? 'deepseek-chat',
      prompt: 'Return JSON array of strings. ' + prompt,
      maxTokens: 600,
    });
    const out = schema.parse(raw);
    await cachePut(key, out);
    return out;
  },
};
Default deepseek-chat Override gpt-4o-mini gemini-1.5-flash ollama:llama3.1
Cost guard: the sha256 cache means a re-run of the same spec is zero tokens. zod schema validation rejects malformed answers so a bad model never burns budget on retries.
5

PR reviewer bot - automated style + best-practice review

Plugs into Box 9

A GitHub Action reads the diff of every **/*.spec.ts, asks AI to flag waitForTimeout, absolute XPath, missing expect, raw page.click outside POMs, and posts inline review comments via gh pr review.

Where: box 9 CI/CD. Workflow .github/workflows/ttacart-ai.yml -> job review.

scripts/ai-pr-review.ts
import { execSync } from 'child_process';
import { AIClient } from '../src/ai/AIClient';

const diff = execSync('git diff --unified=3 origin/main -- "**/*.spec.ts"').toString();
if (!diff) process.exit(0);

const SYSTEM = `You review Playwright TypeScript spec diffs.
Flag ONLY:
  - waitForTimeout
  - absolute XPath
  - missing expect()
  - raw page.click outside a Page Object
Return JSON: { comments: Array<{ file, line, body }> }`;

const review = await AIClient.completeJSON({
  model: process.env.TTA_AI_MODEL ?? 'deepseek-chat',
  system: SYSTEM,
  prompt: diff,
  maxTokens: 800,
});

for (const c of review.comments) {
  execSync(
    `gh pr review --comment -b ${JSON.stringify(c.body)} \
     -F ${c.file}:${c.line}`,
  );
}
Default deepseek-chat Override claude-3-5-sonnet gpt-4o-mini
Cost guard: the diff is capped at 8KB (truncated head + tail) so a huge PR can't blow the budget. The reviewer only runs on changed .spec.ts files, never on the whole repo.

Multi-model adapter - one interface, seven providers

One TypeScript module, seven provider drivers behind a common interface. Switch model by setting process.env.TTA_AI_MODEL. Default: deepseek-chat. Local providers (Ollama, LM Studio) need no API key - the data never leaves the machine.

Provider Model id Auth Endpoint Stream? Where
DeepSeek deepseek-chat DEEPSEEK_API_KEY api.deepseek.com Yes Default
Anthropic claude-3-5-sonnet-20241022 ANTHROPIC_API_KEY api.anthropic.com Yes Cloud
OpenAI gpt-4o-mini OPENAI_API_KEY api.openai.com Yes Cloud
Gemini gemini-1.5-flash GEMINI_API_KEY generativelanguage.googleapis.com Yes Cloud
Mistral mistral-large-latest MISTRAL_API_KEY api.mistral.ai Yes Cloud
Ollama ollama:llama3.1 none localhost:11434 Yes Local
LM Studio lmstudio:any none localhost:1234/v1 Yes Local

AIClient interface + dispatch

src/ai/AIClient.ts
export interface AIRequest {
  model: string;            // e.g. 'deepseek-chat' | 'claude-3-5-sonnet-20241022'
  prompt: string;
  system?: string;
  maxTokens?: number;
}
export interface AIResult {
  text: string;
  tokens: { in: number; out: number };
  cached: boolean;
}

export const AIClient = {
  async complete(req: AIRequest): Promise<AIResult> {
    Budget.assertWithin(req.maxTokens ?? 400);
    const key = await sha256(req.model + (req.system ?? '') + req.prompt);
    const cached = await cacheGet(key);
    if (cached) return { ...cached, cached: true };

    const driver = dispatch(req.model);   // see below
    const out = await driver.complete(req);
    Budget.record(out.tokens);
    await cachePut(key, out);
    return { ...out, cached: false };
  },
  // Convenience wrappers (completeJSON, locators, ...) call complete() under the hood.
};

function dispatch(model: string) {
  if (model.startsWith('ollama:'))      return drivers.ollama;
  if (model.startsWith('lmstudio:'))    return drivers.lmstudio;
  if (model.startsWith('claude-'))      return drivers.anthropic;
  if (model.startsWith('gpt-'))         return drivers.openai;
  if (model.startsWith('gemini-'))      return drivers.gemini;
  if (model.startsWith('mistral-'))     return drivers.mistral;
  if (model.startsWith('deepseek-'))    return drivers.deepseek;
  throw new Error('Unknown TTA_AI_MODEL: ' + model);
}

Privacy + cost guards

Five guards that stop V2 from leaking data or burning the AWS bill. Every one of them is enforced by AIClient itself, not the upstream provider.

1. Serverless proxy for cloud calls. All cloud requests go through a Cloudflare Pages function (functions/api/ai.js), mirroring the existing /api/chat proxy. Cloud keys live in edge env vars and never reach the client bundle.
2. PII redactor before any cloud call. DOM snapshots, trace summaries and console logs go through a regex pass that strips emails, phone numbers, and postal codes before the request leaves the machine. Local providers (Ollama, LM Studio) skip the redactor - the data never leaves the machine in the first place.
3. Per-CI-run token budget cap. AIClient tracks tokens-in + tokens-out and hard-stops at 50_000 by default. Override via TTA_AI_BUDGET. CI workflow budget job fails the build if the run exceeded its cap.
4. sha256 response cache. Cache key: sha256(model + system + prompt). Re-running the same spec doesn't double-charge. Cache lives in .ai-cache/ for runtime + src/testdata/ai-cache/ for committed deterministic test data.
5. BYO key drawer in the TTACart-AI UI. Mirrors the pattern from /playwright/ai-chat/. Students paste their own key into a drawer; it lives in localStorage; the proxy reads it from a header for that session only.

Diagrams (rendered client-side)

Three mermaid diagrams - the 11-box architecture, the failure RCA sequence, and the model routing switch.

1. V2 architecture - 10 boxes + AI

V1 boxes 1-10 stay verbatim. Box 11 (AI) is the new node. Dotted edges show the AI hooks into Observability, Specs and Reports.

graph LR
  classDef code fill:#fde7f3,stroke:#db2777,color:#111
  classDef data fill:#d1fae5,stroke:#16a34a,color:#111
  classDef obs  fill:#cffafe,stroke:#0891b2,color:#111
  classDef cfg  fill:#ede9fe,stroke:#8b5cf6,color:#111
  classDef ci   fill:#ffedd5,stroke:#f97316,color:#111
  classDef qty  fill:#fef9c3,stroke:#d97706,color:#111
  classDef ai   fill:#dcfce7,stroke:#22c55e,color:#111,stroke-width:3px

  P[1. Page Objects]:::code --> F[2. Fixtures + Specs]:::code
  P --> U[3. Utils]:::code
  F --> D[4. Test data]:::data
  U --> O[5. Observability]:::obs
  D --> O

  C[6. playwright.config.ts]:::cfg --> E[7. Test execution + tags]:::cfg
  C --> R[8. Reports + artifacts]:::cfg
  E --> CI[9. CI/CD + Docker + cloud]:::ci
  R --> CI
  CI --> Q[10. Quality + package + VCS]:::qty

  AI[11. AI layer
AIClient + 7 providers]:::ai O -. AI hooks .-> AI AI -. RCA summary .-> R AI -. prompt to spec .-> F

2. Failure RCA flow

A failed test triggers the reporter hook, which calls failure-rca.ts, which calls AIClient, which dispatches to the configured provider, which returns a 2-sentence root cause that gets attached to the report and (in CI) posted as a PR comment.

sequenceDiagram
  autonumber
  participant T as Test
  participant R as Reporter (onTestEnd)
  participant FR as failure-rca.ts
  participant AC as AIClient
  participant DS as DeepSeek (default)
  participant H as HTML report
  participant PR as gh pr review

  T->>R: status=failed + trace.zip
  R->>FR: explainFailure(trace.zip)
  FR->>FR: summariseTrace + PII redact
  FR->>AC: complete({ model, prompt, maxTokens=300 })
  AC->>AC: budget check + cache lookup
  AC->>DS: POST /chat/completions
  DS-->>AC: 2-sentence root cause
  AC-->>FR: AIResult { text, tokens }
  FR-->>R: ai-rca.md
  R-->>H: attach ai-rca.md
  R-->>PR: post inline comment (CI only)
              

3. Model routing

AIClient.complete(req) looks at the req.model prefix and dispatches to one of seven drivers. Cloud calls go through the serverless proxy; local providers (Ollama, LM Studio) hit localhost directly.

graph LR
  REQ[AIClient.complete
req.model = ?] --> SW{prefix?} SW -- deepseek- --> D1[DeepSeek
api.deepseek.com] SW -- claude- --> D2[Anthropic
api.anthropic.com] SW -- gpt- --> D3[OpenAI
api.openai.com] SW -- gemini- --> D4[Gemini
generativelanguage.googleapis.com] SW -- mistral- --> D5[Mistral
api.mistral.ai] SW -- ollama: --> D6[Ollama LOCAL
localhost:11434] SW -- lmstudio: --> D7[LM Studio LOCAL
localhost:1234/v1] D1 --> PX[Serverless proxy
/api/ai] D2 --> PX D3 --> PX D4 --> PX D5 --> PX classDef cloud fill:#dbeafe,stroke:#2563eb,color:#111 classDef local fill:#dcfce7,stroke:#22c55e,color:#111,stroke-width:2px classDef proxy fill:#fef9c3,stroke:#d97706,color:#111 class D1,D2,D3,D4,D5 cloud class D6,D7 local class PX proxy

V1 vs V2 - side-by-side

Same framework spine. V2 adds an AI layer, a multi-model adapter, and the guards that keep it cheap and safe.

Concern V1 (current) V2 + AI (this page)
Locators Hand-written CSS / role-based locators in POMs. Broken locator = test fails. Same hand-written locators. Self-heal tries 3 AI fallbacks before failing; chosen heal is logged for human review.
Failure analysis Open trace.zip in Trace Viewer. Read 5 MB of events. 2-sentence AI root cause attached to the HTML report + posted as PR comment in CI.
Spec authoring Write by hand. Pair-program with a senior. npm run ai:gen -- "prompt" writes a runnable spec; tagged @ai-gen for human approval before merge.
Reporting HTML + Allure + custom tta-report/. Same + AI RCA per failed test, heals.jsonl summary, run-level token usage strip.
CI Sharded 4-way Playwright workflow. Same sharding + review job (PR reviewer) + budget job (fails on token overspend).
Cost Zero LLM cost. Hard budget cap (TTA_AI_BUDGET=50000 default) + sha256 cache + small per-call maxTokens. Local providers cost zero.
Provider lock-in None. None. TTA_AI_MODEL swaps between 7 providers. Local (Ollama, LM Studio) means data never leaves the machine.

Roadmap - V2.x and beyond

Six features queued behind the V2 launch. Each one extends the same AI layer; no new boxes.

Feature What it does Status
AI test impact analysis Read the git diff, predict the impacted specs, and run only those on a PR. Full suite still runs on main. Planned
AI flaky-test classifier Cluster recent failures by error signature; tag obvious flakes as @flaky and quarantine them automatically. Planned
Visual diff triage When toHaveScreenshot fails, ask AI whether the diff is real or just antialiasing / font-fallback noise. Research
Dead-test detector Cross-reference test names against changelogs to find specs that haven't asserted on real behaviour in 6 months. Research
AI a11y reviewer Run axe-core in every spec; pipe violations into AI for a plain-English remediation summary. Pilot
AI run summary End-of-run hook posts a single paragraph "what happened" to Slack + email - failures, flakes, slowest specs, top heals. Pilot
V2 preview. Not in the sidebar yet. Open the V1 page for the published version, or jump back to the Practice overview.