Playwright Test Agents: Planner, Generator and Healer Guide
Playwright test agents are AI helpers that plan, generate, and repair tests automatically, reducing manual test creation and maintenance.
If you have worked with end-to-end tests long enough, you know the real cost is not writing the first test. It is maintaining the next hundred. A small UI change breaks selectors. CI turns red. Instead of shipping features, you fix tests that were passing yesterday.
That is why Playwright v1.56 introduced Playwright test agents in October 2025. The goal is simple - reduce the manual work involved in planning, writing, and maintaining Playwright tests by letting AI handle the repetitive parts while you stay in control.
In this guide, you will learn:
-
What Playwright test agents actually are
-
How they work under the hood
-
How to set them up in your project
-
How to use them safely in CI
-
Where they help, and where human review still matters
If maintaining your test suite takes more time than building it, this is worth your attention.
What are Playwright test agents
Playwright test agents are AI driven helpers built into Playwright starting from v1.56. They assist with planning test scenarios, generating Playwright test code, and repairing broken tests by interacting with a real browser session.
Instead of relying only on manual scripting and maintenance, teams can use these agents to handle structured exploration, code creation, and test repair based on live application behavior.
There are three agents, each responsible for a different stage of the testing lifecycle:
-
Planner - explores the application and creates structured test plans
-
Generator - converts test plans into executable Playwright test files
-
Healer - detects and fixes failing tests caused by UI or locator changes
Planner vs Generator vs Healer
|
Agent |
Primary Role |
Input |
Output |
Best Used For |
|---|---|---|---|---|
|
Planner |
Scenario discovery and planning |
Seed test + running app |
Markdown test plan |
New features, coverage mapping |
|
Generator |
Test code creation |
Markdown test plan |
Playwright .spec.ts files |
Building automation quickly |
|
Healer |
Test maintenance and repair |
Existing failing test suite |
Updated and stabilized test files |
UI changes, locator drift |
Together, these agents introduce structured automation across planning, authoring, and maintenance, while keeping your standard Playwright setup unchanged.
Why Playwright introduced Planner, Generator, and Healer
Modern web applications change constantly. UI components get refactored. Class names change. Layouts shift. Most of the time, the product still works, but the tests do not. As a result, teams often spend more time fixing broken test automation than validating new functionality.
This creates three consistent pain points in test automation:
-
Test planning is manual and time-consuming
-
Building large, reliable test suites takes significant effort
-
Test maintenance becomes a continuous burden after every release
Over time, the testing workflow turns into a repetitive cycle:
Plan → Write → Fix
Playwright introduced Planner, Generator, and Healer to reduce that repetition. Instead of engineers handling every stage manually, parts of that loop can now be assisted or automated:
Agent plans → Agent writes → Agent fixes
The goal is not to remove human oversight. It is to reduce the time spent on repetitive test work so teams can focus on real defects, coverage gaps, and product quality.
How do Playwright test agents work?
Playwright test agents use the Model Context Protocol (MCP) to connect a large language model with a real browser. The AI does not guess what the page looks like. It interacts with the actual application, observes live DOM state, and makes decisions based on real behavior.
Here is the high-level flow:
-
Planner explores the app using a real browser session
-
Planner writes a markdown test plan with scenarios, steps, and assertions
-
Generator reads the plan and produces Playwright test files
-
Tests run in CI like any standard Playwright suite
-
Healer detects and fixes broken tests automatically
Three layers work together to make this happen.
Playwright Engine handles the browser automation through the Chrome DevTools Protocol. This is the same foundation that powers every standard Playwright test.
LLM Layer uses a large language model (GPT, Claude, or similar) to interpret DOM structure, page routes, and application behavior. The model receives structured snapshots rather than raw screenshots, which keeps it accurate and token-efficient.
Orchestration Loop coordinates the exchange between the engine and the LLM. It sends page context to the model, receives instructions back, executes browser actions, and repeats until the task is complete.
This is what separates playwright test agents from generic AI code generators. A code generation tool predicts what your page might look like. Playwright test agents interact with what your page actually does.
How can AI plan, generate, and heal Playwright tests automatically?
The Planner explores your live application through a real browser, discovers user flows and edge cases, and produces structured markdown test plans. The Generator reads those plans, opens the application, verifies selectors against the real DOM, and writes test files with stable locators and assertions. The Healer fixes broken tests by analyzing failure traces, identifying root causes, and applying targeted code changes at runtime.
Let's look at each in detail.
How the Planner agent discovers test scenarios
The Planner does not ask you to list every test case upfront. It explores your application the way a QA engineer would during an exploratory session, except it does it systematically and documents everything as it goes.
The process works like this:
-
Planner runs your seed test (tests/seed.spec.ts) to set up the base environment - authentication, initial navigation, and test data
-
It opens the application in a real browser and begins navigating through pages and user flows
-
At each step, it inspects the DOM to identify interactive elements, forms, navigation links, and key UI components
-
It maps out user journeys - happy paths, error states, boundary conditions, and edge cases
-
It writes a structured markdown test plan in the specs/ folder, with scenarios, steps, expected results, and assertions
-
Each scenario in the plan is detailed enough for the Generator to convert directly into executable test code
The output is not a vague list of ideas. It is a precise, step-by-step specification that covers what to test, how to test it, and what the expected outcome should be. Teams looking for a quick reference on Playwright syntax can also pair this with the Playwright cheatsheet to review locator patterns and assertion strategies.
For example, if the Planner explores an e-commerce checkout flow, it does not just write "test checkout." It produces scenarios like "guest user adds item to cart, proceeds to checkout, enters shipping details, and sees order confirmation," along with edge cases like "user submits checkout with an expired credit card and sees a validation error."
The key advantage here is coverage. A human tester might focus on the obvious paths and miss less common flows. The Planner systematically works through the application's UI, identifying scenarios that a manual approach might overlook. It also structures the plan in a consistent format, which means the Generator can process it without ambiguity.
How the Generator agent creates tests
When the Generator receives a spec file, it does not produce code from a template. It opens your application in a real browser and validates every step.
The process works like this:
-
Generator reads a spec file (for example, specs/checkout-flow.md)
-
It launches the app using your seed test as the base
-
For each scenario, it navigates to the correct page and inspects the DOM
-
It selects locators using Playwright's preferred strategies - role-based, text-based, and test-id selectors
-
It writes test code with proper assertions, waits, and error handling
-
Each output file maps one-to-one with a scenario in the spec
The result is code that reads like it was written by a senior SDET. Not brittle CSS selectors. Not XPath chains that break when someone moves a div. Actual production-grade locators. For teams exploring other AI test generation tools, this approach stands out because the output is validated against a live application, not predicted from static code.
One documented case showed a team generating 82 end-to-end tests for an e-commerce application using the Playwright Skill with Claude Code. Product browsing, cart operations, checkout flows. All from a structured plan, all validated against the live app.
How the Healer agent fixes tests
The Healer is where teams with large existing suites get the most value. Here is what happens when a test fails:
-
The Healer runs the failing test in debug mode
-
It checks console logs, network requests, and page snapshots at the failure point
-
It performs root cause analysis: is this a selector issue, a timing problem, or an actual application bug?
-
If the test is the problem, the Healer updates the code. It picks better selectors, adjusts waits, or modifies assertions
-
It re-runs the test to confirm the fix works
-
If the application itself is broken (not the test), it marks the test as skipped
That last point is important. The Healer does not patch around real bugs. If a checkout button genuinely stopped working, the Healer flags it instead of rewriting the test to ignore the failure. You still know something is wrong. You just don't waste time assuming it's a test problem.
A quick example
Say your e-commerce site's checkout button CSS class changes from .btn-checkout to .btn-primary-checkout after a frontend refactor. In a traditional setup, every test clicking that button breaks. Someone has to find the affected tests, update selectors, and re-run the suite.
With the Healer, the process looks different. It detects the failure, inspects the page, sees the button text and ARIA role haven't changed, switches to page.getByRole('button', { name: 'Checkout' }), updates the test file, and confirms the test passes. No developer time spent. No Jira ticket. No "can someone look at this flaky test in standup." It just gets fixed.
Setting up Playwright test agents
Getting started requires Playwright v1.56 or later and a compatible AI tool. The setup takes about five minutes.
Step 1 - Install the latest Playwright
npm install -D @playwright/test@latest
npx playwright install chromium
Step 2 - Initialize the agents
Run the init command with your preferred AI loop. Playwright supports VS Code (with Copilot), Claude, and OpenCode:
# For VS Code with Copilot
npx playwright init-agents --loop=vscode
# For Claude Code
npx playwright init-agents --loop=claude
# For OpenCode
npx playwright init-agents --loop=opencode
This generates agent definition files and a seed test. The definitions are markdown-based configuration files stored in your .github/ folder. They describe each agent's behavior, instructions, and available tools.
Note: VS Code v1.105 or later is required for the agentic experience to work in VS Code.
Step 3 - Configure your seed test
The seed test (tests/seed.spec.ts) is the starting point for all agent activity. It sets up the base environment - authentication, test data, navigation.
import { test } from '@playwright/test';
test('seed', async ({ page }) => {
await page.goto('https://your-app.com');
// Add login or setup logic here
});
The Planner runs this seed test before it starts exploring. If your app needs authentication, add the login flow here. Everything the agents do builds on this starting point.
Step 4 - Run the Planner
Open your AI tool's chat, select planner mode, and prompt it:
Explore the app and generate a test plan for user
registration and checkout flows. Use seed.spec.ts as base.
The Planner navigates your app, discovers UI elements and user flows, and produces a markdown file in the specs/ folder with scenarios, steps, expected results, and edge cases.
Step 5 - Generate tests
Switch to generator mode and point it to the plan:
Use the test plan in specs/checkout-flow.md to generate
Playwright tests. Save them in tests/checkout/
The Generator reads the spec, opens the live app, verifies selectors, and writes Playwright test scripts mapped to each scenario.
Step 6 - Heal and validate
Run the Healer against your new or existing suite:
Run the playwright test healer on the test suite in /tests.
Fix any failing tests and verify your fixes.
The Healer executes tests, finds failures, applies fixes, and re-runs until everything passes or it flags genuinely broken functionality.
Project structure after setup
repo/
├── .github/ # Agent definitions (planner, generator, healer)
├── specs/ # Markdown test plans
│ └── checkout-flow.md
├── tests/
│ ├── seed.spec.ts # Base environment setup
│ └── checkout/ # Generated test files
├── guest-checkout.spec.ts
└── registered-checkout.spec.ts
└── playwright.config.ts
Regenerate agent definitions whenever you update Playwright. Run npx playwright init-agents again to pick up new tools and instructions.
Using Playwright test agents in CI/CD
The agents themselves are interactive tools, designed for use through VS Code Copilot, Claude Code, or OpenCode. But the tests they produce are standard Playwright tests. Your CI pipeline runs them the same way it runs any other Playwright suite.
# .github/workflows/playwright.yml
name: Playwright Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 18
- run: npm ci
- run: npx playwright install --with-deps
- run: npx playwright test
- uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-report
path: playwright-report/
The if: always() on artifact upload is critical. Without it, failed test reports do not get saved, and those are exactly the reports you need.
Tracking stability of agent-generated tests
Here is the thing most teams overlook. Running agent-generated tests in CI is easy. Knowing whether those tests are actually stable across builds is a different problem.
When you're producing tests with the Generator and repairing them with the Healer, you need answers to questions that raw CI logs cannot provide:
-
Which tests were healed and how often do they break again?
-
Are healing events increasing or decreasing over time?
-
Is a failure a new regression or the same flaky test from last week?
This is where a reporting layer becomes necessary. TestDino tracks test stability patterns across CI runs. It classifies failures into categories - actual bug, flaky test, or UI change - and gives you historical context for every failure. If the Healer fixed a test last Tuesday and it broke again on Thursday, you can see that pattern immediately instead of re-investigating from scratch.
For teams using playwright test agents at scale, that kind of visibility is the difference between trusting your suite and guessing. Test analytics help you understand trends over time. Without it, you are generating tests faster than you can verify whether they are actually reliable.
The workflow that works
The strongest setup looks like this:
-
Planner discovers scenarios and writes specs
-
Generator creates test files from specs
-
Tests run in CI on every push
-
Failures get classified and tracked in a reporting tool
-
Healer runs periodically to fix locator drift and unstable tests
-
Reporting confirms whether healed tests stay stable or keep breaking
That feedback loop is what turns playwright test agents from a cool experiment into a reliable part of your pipeline.
Limitations you should know
Playwright test agents are useful. They are not perfect. Being honest about the limits helps you use them well.
Selectors are not always right. The AI picks good locators most of the time, but it can still choose unstable ones. A text locator works great until someone changes button copy. A role-based locator breaks if ARIA roles are missing. Always review generated code before merging. The Playwright Trace Viewer can help you inspect exactly what the agent saw during test execution.
Complex UI changes need a human. If a redesign changes the entire user flow, not just a selector, the Healer cannot redesign the test. It fixes locators. It does not rewrite test logic. That is still your job.
TypeScript and JavaScript only. Playwright test agents currently support the JS/TS test runner. Teams using Playwright for Python do not have official agent support yet, though it is a requested feature on GitHub.
Over-reliance is a real risk. When tests write and fix themselves, there is a temptation to stop reviewing them. Do not do that. AI-generated tests should go through the same code review process as anything else in your codebase.
Agents work best on stable base environments. If your seed test is flaky, if auth breaks intermittently, or if the test environment is unreliable, the Planner produces weak plans and the Generator writes fragile tests. Garbage in, garbage out. If your suite is running slow, that compounds the problem. This is the most common reason teams have a bad first experience with playwright test agents. Fix the foundation first. Then let the agents do their work.
Best practices for production teams
These are practical patterns that help teams get consistent results from playwright test agents.
Start with a solid seed test. Your seed test is the foundation. If it does not reliably set up the right environment, nothing the agents produce will be reliable either. Spend time getting authentication, test data, and navigation right before asking agents to explore.
Keep specs in version control. Treat markdown test plans like documentation. Review them in pull requests. Good specs produce good tests. Bad specs produce tests you will rewrite manually anyway.
Add data-testid attributes to critical elements. The Generator prefers test IDs when they exist. Adding them to key buttons, forms, and navigation elements gives the agent better options and produces more stable tests.
Run the Healer on a schedule. Do not wait for CI to break. Set up a weekly Healer run against your full suite. Catching small locator drift early is cheaper than debugging a wall of failures after a major release.
Pair agent output with a reporting tool. Agents create and fix tests, but they don't show you the big picture. A tool like TestDino gives you build-level visibility into test stability, flaky test trends, and failure classification. When the Healer fixes a test, your reporting tool confirms whether that fix actually held or just delayed the problem.
Regenerate agent definitions after every Playwright update. When you upgrade Playwright, re-run npx playwright init-agents to get updated tools and instructions. Stale definitions mean your agents miss improvements and may not work correctly with newer Playwright features.
Conclusion
Playwright test agents bring real automation to the parts of testing that have always been manual. The Planner discovers what to test, the Generator writes the code, and the Healer keeps it working. Together, they cut the time teams spend on test creation and maintenance without removing human oversight where it matters.
That said, generating and fixing tests is only half the problem. Knowing whether those tests stay stable across builds is what separates a reliable suite from one that quietly rots. TestDino fills that gap by tracking test stability, classifying failures, and giving you historical context on every flaky or healed test in your pipeline.
If your team spends more time fixing tests than writing features, Playwright test agents paired with a reporting layer like TestDino give you a path out. Start with a solid seed test, let the agents handle the repetitive work, and use the data to stay confident that your suite actually means something.
FAQs
Table of content
Flaky tests killing your velocity?
TestDino auto-detects flakiness, categorizes root causes, tracks patterns over time.