Playwright Test Agents: Planner, Generator and Healer Guide

Playwright test agents are AI helpers that plan, generate, and repair tests automatically, reducing manual test creation and maintenance.

Thumbnail 3

If you have worked with end-to-end tests long enough, you know the real cost is not writing the first test. It is maintaining the next hundred. A small UI change breaks selectors. CI turns red. Instead of shipping features, you fix tests that were passing yesterday.

That is why Playwright v1.56 introduced Playwright test agents in October 2025. The goal is simple - reduce the manual work involved in planning, writing, and maintaining Playwright tests by letting AI handle the repetitive parts while you stay in control.

In this guide, you will learn:

  • What Playwright test agents actually are

  • How they work under the hood

  • How to set them up in your project

  • How to use them safely in CI

  • Where they help, and where human review still matters

If maintaining your test suite takes more time than building it, this is worth your attention.

What are Playwright test agents

Playwright test agents are AI driven helpers built into Playwright starting from v1.56. They assist with planning test scenarios, generating Playwright test code, and repairing broken tests by interacting with a real browser session.

Instead of relying only on manual scripting and maintenance, teams can use these agents to handle structured exploration, code creation, and test repair based on live application behavior.

There are three agents, each responsible for a different stage of the testing lifecycle:

  • Planner - explores the application and creates structured test plans

  • Generator - converts test plans into executable Playwright test files

  • Healer - detects and fixes failing tests caused by UI or locator changes

Planner vs Generator vs Healer

Agent

Primary Role

Input

Output

Best Used For

Planner

Scenario discovery and planning

Seed test + running app

Markdown test plan

New features, coverage mapping

Generator

Test code creation

Markdown test plan

Playwright .spec.ts files

Building automation quickly

Healer

Test maintenance and repair

Existing failing test suite

Updated and stabilized test files

UI changes, locator drift

Together, these agents introduce structured automation across planning, authoring, and maintenance, while keeping your standard Playwright setup unchanged.

Why Playwright introduced Planner, Generator, and Healer

Modern web applications change constantly. UI components get refactored. Class names change. Layouts shift. Most of the time, the product still works, but the tests do not. As a result, teams often spend more time fixing broken test automation than validating new functionality.

This creates three consistent pain points in test automation:

  • Test planning is manual and time-consuming

  • Building large, reliable test suites takes significant effort

  • Test maintenance becomes a continuous burden after every release

Over time, the testing workflow turns into a repetitive cycle:

Plan → Write → Fix

Playwright introduced Planner, Generator, and Healer to reduce that repetition. Instead of engineers handling every stage manually, parts of that loop can now be assisted or automated:

Agent plans → Agent writes → Agent fixes

The goal is not to remove human oversight. It is to reduce the time spent on repetitive test work so teams can focus on real defects, coverage gaps, and product quality.

How do Playwright test agents work?

Playwright test agents use the Model Context Protocol (MCP) to connect a large language model with a real browser. The AI does not guess what the page looks like. It interacts with the actual application, observes live DOM state, and makes decisions based on real behavior.

Here is the high-level flow:

  1. Planner explores the app using a real browser session

  2. Planner writes a markdown test plan with scenarios, steps, and assertions

  3. Generator reads the plan and produces Playwright test files

  4. Tests run in CI like any standard Playwright suite

  5. Healer detects and fixes broken tests automatically

Three layers work together to make this happen.

Playwright Engine handles the browser automation through the Chrome DevTools Protocol. This is the same foundation that powers every standard Playwright test.

LLM Layer uses a large language model (GPT, Claude, or similar) to interpret DOM structure, page routes, and application behavior. The model receives structured snapshots rather than raw screenshots, which keeps it accurate and token-efficient.

Orchestration Loop coordinates the exchange between the engine and the LLM. It sends page context to the model, receives instructions back, executes browser actions, and repeats until the task is complete.

This is what separates playwright test agents from generic AI code generators. A code generation tool predicts what your page might look like. Playwright test agents interact with what your page actually does.

How can AI plan, generate, and heal Playwright tests automatically?

The Planner explores your live application through a real browser, discovers user flows and edge cases, and produces structured markdown test plans. The Generator reads those plans, opens the application, verifies selectors against the real DOM, and writes test files with stable locators and assertions. The Healer fixes broken tests by analyzing failure traces, identifying root causes, and applying targeted code changes at runtime.

Let's look at each in detail.

How the Planner agent discovers test scenarios

The Planner does not ask you to list every test case upfront. It explores your application the way a QA engineer would during an exploratory session, except it does it systematically and documents everything as it goes.

The process works like this:

  1. Planner runs your seed test (tests/seed.spec.ts) to set up the base environment - authentication, initial navigation, and test data

  2. It opens the application in a real browser and begins navigating through pages and user flows

  3. At each step, it inspects the DOM to identify interactive elements, forms, navigation links, and key UI components

  4. It maps out user journeys - happy paths, error states, boundary conditions, and edge cases

  5. It writes a structured markdown test plan in the specs/ folder, with scenarios, steps, expected results, and assertions

  6. Each scenario in the plan is detailed enough for the Generator to convert directly into executable test code

The output is not a vague list of ideas. It is a precise, step-by-step specification that covers what to test, how to test it, and what the expected outcome should be. Teams looking for a quick reference on Playwright syntax can also pair this with the Playwright cheatsheet to review locator patterns and assertion strategies.

For example, if the Planner explores an e-commerce checkout flow, it does not just write "test checkout." It produces scenarios like "guest user adds item to cart, proceeds to checkout, enters shipping details, and sees order confirmation," along with edge cases like "user submits checkout with an expired credit card and sees a validation error."

The key advantage here is coverage. A human tester might focus on the obvious paths and miss less common flows. The Planner systematically works through the application's UI, identifying scenarios that a manual approach might overlook. It also structures the plan in a consistent format, which means the Generator can process it without ambiguity.

How the Generator agent creates tests

When the Generator receives a spec file, it does not produce code from a template. It opens your application in a real browser and validates every step.

The process works like this:

  • Generator reads a spec file (for example, specs/checkout-flow.md)

  • It launches the app using your seed test as the base

  • For each scenario, it navigates to the correct page and inspects the DOM

  • It selects locators using Playwright's preferred strategies - role-based, text-based, and test-id selectors

  • It writes test code with proper assertions, waits, and error handling

  • Each output file maps one-to-one with a scenario in the spec

The result is code that reads like it was written by a senior SDET. Not brittle CSS selectors. Not XPath chains that break when someone moves a div. Actual production-grade locators. For teams exploring other AI test generation tools, this approach stands out because the output is validated against a live application, not predicted from static code.

One documented case showed a team generating 82 end-to-end tests for an e-commerce application using the Playwright Skill with Claude Code. Product browsing, cart operations, checkout flows. All from a structured plan, all validated against the live app.

How the Healer agent fixes tests

The Healer is where teams with large existing suites get the most value. Here is what happens when a test fails:

  1. The Healer runs the failing test in debug mode

  2. It checks console logs, network requests, and page snapshots at the failure point

  3. It performs root cause analysis: is this a selector issue, a timing problem, or an actual application bug?

  4. If the test is the problem, the Healer updates the code. It picks better selectors, adjusts waits, or modifies assertions

  5. It re-runs the test to confirm the fix works

  6. If the application itself is broken (not the test), it marks the test as skipped

That last point is important. The Healer does not patch around real bugs. If a checkout button genuinely stopped working, the Healer flags it instead of rewriting the test to ignore the failure. You still know something is wrong. You just don't waste time assuming it's a test problem.

A quick example

Say your e-commerce site's checkout button CSS class changes from .btn-checkout to .btn-primary-checkout after a frontend refactor. In a traditional setup, every test clicking that button breaks. Someone has to find the affected tests, update selectors, and re-run the suite.

With the Healer, the process looks different. It detects the failure, inspects the page, sees the button text and ARIA role haven't changed, switches to page.getByRole('button', { name: 'Checkout' }), updates the test file, and confirms the test passes. No developer time spent. No Jira ticket. No "can someone look at this flaky test in standup." It just gets fixed.

Setting up Playwright test agents

Getting started requires Playwright v1.56 or later and a compatible AI tool. The setup takes about five minutes.

Step 1 - Install the latest Playwright

terminal
npm install -D @playwright/test@latest
npx playwright install chromium

Step 2 - Initialize the agents

Run the init command with your preferred AI loop. Playwright supports VS Code (with Copilot), Claude, and OpenCode:

terminal
# For VS Code with Copilot
npx playwright init-agents --loop=vscode

# For Claude Code
npx playwright init-agents --loop=claude

# For OpenCode
npx playwright init-agents --loop=opencode

This generates agent definition files and a seed test. The definitions are markdown-based configuration files stored in your .github/ folder. They describe each agent's behavior, instructions, and available tools.

Note: VS Code v1.105 or later is required for the agentic experience to work in VS Code.

Step 3 - Configure your seed test

The seed test (tests/seed.spec.ts) is the starting point for all agent activity. It sets up the base environment - authentication, test data, navigation.

tests/seed.spec.ts
import { test } from '@playwright/test';

test('seed'async ({ page }) => {
  await page.goto('https://your-app.com');
  // Add login or setup logic here
});

The Planner runs this seed test before it starts exploring. If your app needs authentication, add the login flow here. Everything the agents do builds on this starting point.

Step 4 - Run the Planner

Open your AI tool's chat, select planner mode, and prompt it:

terminal
Explore the app and generate a test plan for user
registration and checkout flows. Use seed.spec.ts as base.

The Planner navigates your app, discovers UI elements and user flows, and produces a markdown file in the specs/ folder with scenarios, steps, expected results, and edge cases.

Step 5 - Generate tests

Switch to generator mode and point it to the plan:

terminal
Use the test plan in specs/checkout-flow.md to generate
Playwright tests. Save them in tests/checkout/

The Generator reads the spec, opens the live app, verifies selectors, and writes Playwright test scripts mapped to each scenario.

Step 6 - Heal and validate

Run the Healer against your new or existing suite:

terminal
Run the playwright test healer on the test suite in /tests.
Fix any failing tests and verify your fixes.

The Healer executes tests, finds failures, applies fixes, and re-runs until everything passes or it flags genuinely broken functionality.

Project structure after setup

code
repo/
├── .github/           # Agent definitions (planner, generator, healer)
├── specs/             # Markdown test plans
│  └── checkout-flow.md
├── tests/
│  ├── seed.spec.ts   # Base environment setup
│  └── checkout/      # Generated test files
        ├── guest-checkout.spec.ts
        └── registered-checkout.spec.ts
└── playwright.config.ts

Regenerate agent definitions whenever you update Playwright. Run npx playwright init-agents again to pick up new tools and instructions.

Using Playwright test agents in CI/CD

The agents themselves are interactive tools, designed for use through VS Code Copilot, Claude Code, or OpenCode. But the tests they produce are standard Playwright tests. Your CI pipeline runs them the same way it runs any other Playwright suite.

.github/workflows/playwright.yml
# .github/workflows/playwright.yml
namePlaywright Tests
on: [pushpull_request]
jobs:
  test:
    runs-onubuntu-latest
    steps:
      usesactions/checkout@v4
      usesactions/setup-node@v4
        with:
          node-version18
      runnpm ci
      runnpx playwright install --with-deps
      runnpx playwright test
      usesactions/upload-artifact@v4
        ifalways()
        with:
          nameplaywright-report
          pathplaywright-report/

The if: always() on artifact upload is critical. Without it, failed test reports do not get saved, and those are exactly the reports you need.

Tracking stability of agent-generated tests

Here is the thing most teams overlook. Running agent-generated tests in CI is easy. Knowing whether those tests are actually stable across builds is a different problem.

When you're producing tests with the Generator and repairing them with the Healer, you need answers to questions that raw CI logs cannot provide:

  • Which tests were healed and how often do they break again?

  • Are healing events increasing or decreasing over time?

  • Is a failure a new regression or the same flaky test from last week?

This is where a reporting layer becomes necessary. TestDino tracks test stability patterns across CI runs. It classifies failures into categories - actual bug, flaky test, or UI change - and gives you historical context for every failure. If the Healer fixed a test last Tuesday and it broke again on Thursday, you can see that pattern immediately instead of re-investigating from scratch.

For teams using playwright test agents at scale, that kind of visibility is the difference between trusting your suite and guessing. Test analytics help you understand trends over time. Without it, you are generating tests faster than you can verify whether they are actually reliable.

The workflow that works

The strongest setup looks like this:

  • Planner discovers scenarios and writes specs

  • Generator creates test files from specs

  • Tests run in CI on every push

  • Failures get classified and tracked in a reporting tool

  • Healer runs periodically to fix locator drift and unstable tests

  • Reporting confirms whether healed tests stay stable or keep breaking

That feedback loop is what turns playwright test agents from a cool experiment into a reliable part of your pipeline.

Limitations you should know

Playwright test agents are useful. They are not perfect. Being honest about the limits helps you use them well.

Selectors are not always right. The AI picks good locators most of the time, but it can still choose unstable ones. A text locator works great until someone changes button copy. A role-based locator breaks if ARIA roles are missing. Always review generated code before merging. The Playwright Trace Viewer can help you inspect exactly what the agent saw during test execution.

Complex UI changes need a human. If a redesign changes the entire user flow, not just a selector, the Healer cannot redesign the test. It fixes locators. It does not rewrite test logic. That is still your job.

TypeScript and JavaScript only. Playwright test agents currently support the JS/TS test runner. Teams using Playwright for Python do not have official agent support yet, though it is a requested feature on GitHub.

Over-reliance is a real risk. When tests write and fix themselves, there is a temptation to stop reviewing them. Do not do that. AI-generated tests should go through the same code review process as anything else in your codebase.

Agents work best on stable base environments. If your seed test is flaky, if auth breaks intermittently, or if the test environment is unreliable, the Planner produces weak plans and the Generator writes fragile tests. Garbage in, garbage out. If your suite is running slow, that compounds the problem. This is the most common reason teams have a bad first experience with playwright test agents. Fix the foundation first. Then let the agents do their work.

Best practices for production teams

These are practical patterns that help teams get consistent results from playwright test agents.

Start with a solid seed test. Your seed test is the foundation. If it does not reliably set up the right environment, nothing the agents produce will be reliable either. Spend time getting authentication, test data, and navigation right before asking agents to explore.

Keep specs in version control. Treat markdown test plans like documentation. Review them in pull requests. Good specs produce good tests. Bad specs produce tests you will rewrite manually anyway.

Add data-testid attributes to critical elements. The Generator prefers test IDs when they exist. Adding them to key buttons, forms, and navigation elements gives the agent better options and produces more stable tests.

Run the Healer on a schedule. Do not wait for CI to break. Set up a weekly Healer run against your full suite. Catching small locator drift early is cheaper than debugging a wall of failures after a major release.

Pair agent output with a reporting tool. Agents create and fix tests, but they don't show you the big picture. A tool like TestDino gives you build-level visibility into test stability, flaky test trends, and failure classification. When the Healer fixes a test, your reporting tool confirms whether that fix actually held or just delayed the problem.

Regenerate agent definitions after every Playwright update. When you upgrade Playwright, re-run npx playwright init-agents to get updated tools and instructions. Stale definitions mean your agents miss improvements and may not work correctly with newer Playwright features.

Conclusion

Playwright test agents bring real automation to the parts of testing that have always been manual. The Planner discovers what to test, the Generator writes the code, and the Healer keeps it working. Together, they cut the time teams spend on test creation and maintenance without removing human oversight where it matters.

That said, generating and fixing tests is only half the problem. Knowing whether those tests stay stable across builds is what separates a reliable suite from one that quietly rots. TestDino fills that gap by tracking test stability, classifying failures, and giving you historical context on every flaky or healed test in your pipeline.

If your team spends more time fixing tests than writing features, Playwright test agents paired with a reporting layer like TestDino give you a path out. Start with a solid seed test, let the agents handle the repetitive work, and use the data to stay confident that your suite actually means something.

Agents Build Tests. We Track Their Stability.
See how AI-generated and healed tests behave across CI runs, detect unstable patterns, and separate real bugs from noise.
Start TestDino CTA Graphic

FAQs

What are Playwright test agents?
Playwright test agents are AI-driven components built into Playwright v1.56 and later. They consist of three agents: the Planner discovers test scenarios, the Generator creates executable Playwright code, and the Healer repairs broken tests automatically by interacting with a live browser session.
Can AI generate Playwright tests automatically?
Yes. The Generator agent reads markdown test plans, opens the live application, verifies selectors against the real DOM, and writes ready-to-run .spec.ts files with proper assertions and locator strategies. The output is production-grade code, not templates.
What is a Playwright test healer?
The Healer agent runs failing tests in debug mode, inspects page snapshots and console logs, identifies broken locators, updates the test code with stable alternatives, and re-runs to confirm. If the app itself is broken rather than the test, it skips the test instead of hiding the bug.
Do Playwright test agents work in CI/CD?
Yes. The tests generated by agents are standard Playwright tests that run in any CI system, including GitHub Actions, GitLab CI, and Jenkins. The agents themselves are interactive tools, but the tests they produce integrate into your pipeline like any other .spec.ts file.
How does TestDino improve Playwright test agent workflows?
TestDino tracks test stability, failure patterns, and flaky behavior across CI builds. It classifies failures into categories (actual bug, flaky, UI change) and provides historical context. For teams using playwright test agents, this means you can see whether healed tests stay stable, which agent-generated tests fail most, and where to focus manual review.
Vishwas Tiwari

AI/ML Developer

Vishwas Tiwari is an AI/ML Developer at TestDino with 1+ years of experience in test automation analytics and machine learning. He specializes in Playwright test automation and building ML models for error categorization and failure pattern detection.

Vishwas built TestDino's MCP server for test automation workflows and developed ML models using Python, Pandas, NumPy, and Scikit-learn that automate test data analysis and flaky test detection. He has authored 6+ technical guides on Playwright CI/CD integration, test failure analysis, and automation SOPs.

He holds a Bachelor Degree in Data Science from BMU University and contributes to open-source test automation tooling on GitHub.

Get started fast

Step-by-step guides, real-world examples, and proven strategies to maximize your test reporting success