Playwright AI Codegen: Generate Tests Automatically

Playwright Codegen gets you runnable tests fast by recording real browser flows. Use AI to refactor the noisy output into stable, reviewable tests that actually survive CI.

Savan Vaghani

Feb 25, 2026

If you want Playwright coverage fast, Codegen is the quickest way to get a runnable draft.

You click through a real browser, and Playwright writes the code. The catch is that raw recordings are usually too noisy for a long-living suite.

This guide shows a workflow that actually ships: record with Codegen, refactor with clear rules, then harden for CI.

What is Playwright AI Codegen?

Playwright Codegen is Playwright's recorder. You interact with the app in a real browser while the inspector generates test code and locators.

The "AI" part is not a Playwright feature. It is how you use an AI coding agent to turn the recording into clean tests that match your repo style and survive review.

Here is the shortest definition you can paste into a doc or answer box:

Playwright AI Codegen is a workflow where you use Playwright Codegen to record a user flow, then use AI to refactor the generated script into stable, maintainable tests.

What Codegen gives you in five minutes

Codegen is great at capturing the truth of the UI. You do not forget the weird modal, the extra redirect, or the hidden animation. It also gives you working starter locators, which is usually the slowest part when you start a new flow.

In practice, Codegen is best for:

Tip: Record only one happy path per test. If you record exploration clicks, you will ship a script that is hard to review and harder to maintain.

bootstrapping a brand new flow when you are starting from zero
generating locators you can quickly standardize
getting a runnable spec you can iterate on instead of staring at a blank file

Where recordings go wrong

Raw Codegen output is basically click logs. It tends to include extra navigation, brittle selectors, and missing assertions. If you merge recordings as is, you will get the classic suite smell: "Why it worked on my System" and "it flaked every third run".

The fix is simple: treat Codegen output as a draft, not as production test code.

Note: Most flakes from recorded tests come from two things: weak assertions and brittle locators. Fix those first before you blame timing.

When should you use Codegen vs hand written tests?

Use Codegen when speed matters and you need a working baseline today. Write by hand when you need architecture, reuse, and long term stability.

If your suite is a product, Codegen is your sketch tool. You still have to engineer the final version.

Here is a decision table that makes this choice obvious.

Situation	Best move	Why it works
New flow, you need one passing test today	Start with Codegen	You get a runnable draft fast
Login is painful to repeat	Codegen plus storage state	You log in once and reuse state for more recordings
Long term regression suite	Write by hand or refactor recordings hard	You need structure, not a linear script
UI changes daily	Record only as a starting point	Selectors rot quickly without policy
Many similar crud flows	One good recording, then AI templates the rest	You reuse patterns instead of rebuilding each time
Fails only in CI	Debug with traces, do not re record blindly	Traces show what happened, step by step

Quick rule of thumb

Record to get momentum, then refactor until the test reads like intent, not like a replay.

How to generate tests with Codegen step by step

This is the fastest path to a clean draft. The goal is not to record everything. The goal is to capture one tight happy path that becomes your template.

Step 1 – Install and sanity check

In a Node project, install Playwright Test and download browsers.

terminal

npm i -D @playwright/test
npx playwright install

Run one test to confirm your setup works before you record anything.

Step 2 – Start Codegen from the inspector

You can record from the cli or from the vs code extension. If your team is in vs code, the extension is smoother. If you want a quick spike, the cli inspector is perfect.

terminal

npx playwright codegen https://demo.playwright.dev/todomvc/#/

Note: This demo app has no login. For the storage state step, you are using it to persist localStorage, so your todos appear without retyping. The same workflow applies to real apps where storage state also includes cookies.

You will see two windows. One is the browser you click in. The other is the inspector that shows the generated code and locators.

Step 3 – Record one happy path only

Keep it tight. One test should cover one behavior.

A clean recording usually looks like:

Navigate to the page
Do the key action
Assert the success state

Do not mix admin setup and the user flow. If setup is required, do it outside the recording.

Step 4 – Add at least one real assertion

Recording clicks is not testing. While recording, switch to assertion mode in the inspector and click the element that proves success.

Aim for one assertion that proves the outcome, plus one that proves the right data is shown if that matters. Avoid asserting every pixel.

Tip: Pick assertions that prove state change, not layout. For TodoMVC, "items left" and the presence of a todo in the list are better than checking css or exact positions.

Codegen can save storage state to a file at the end of the session. That includes cookies and local storage, which is enough for most apps to start in an authenticated state next time.

terminal

npx playwright codegen --save-storage=auth.json https://demo.playwright.dev/todomvc/#/

Keep that file local. Add it to .gitignore.

Now record while already logged in:

terminal

npx playwright codegen --load-storage=auth.json https://demo.playwright.dev/todomvc/#/

Step 6 – Record in the same device profile you run in

If your app changes behavior on mobile, record with the same viewport or device profile you will run in CI.

terminal

npx playwright codegen --device="iPhone 13" https://demo.playwright.dev/todomvc/#/

Step 7 – Paste into a spec and run once

Copy the generated code into a real spec file and run it once. Do not let raw recordings sit in the repo.

terminal

npx playwright test path/to/your.spec.ts

If it fails, fix the locator or add the missing assertion. Re recording is a last resort.

How to refactor recordings for code review

This is where Codegen becomes production ready. If you skip this, you will ship brittle scripts.

Lock a locator strategy and enforce it

A locator policy is the difference between a stable suite and a flaky mess. Pick a rule and stick to it.

A practical order that holds up well:

getByRole with an accessible name for interactive UI
getByLabel for form fields
getByTestId when the app exposes stable ids
css selectors as a last resort

Here is the kind of refactor that saves you from future pain:

code

// Before: brittle selector captured during recording
await page.locator('div:nth-child(3) > button').click();


// After: readable intent
await page.getByRole('button', { name: 'Checkout' }).click();

If you want a deeper locator playbook for AI agents, link your team to Playwright Skill. It is a set of practical guides maintained by TestDino that emphasizes readable selectors and stable patterns.

Tip: Locator policy is the easiest way to reduce flake fast. If your app can add stable test ids, use getByTestId for the few elements that do not have good roles or labels, then keep the rest role based.

Extract setup into helpers or fixtures

Raw recordings repeat setup steps everywhere. That makes suites slow, noisy, and hard to debug.

Extract the boring parts:

authentication
data creation
navigation to the page under test

Example helper:

code

async function login(page) {
  await page.goto('/login');
  await page.getByLabel('Email').fill(process.env.E2E_EMAIL);
  await page.getByLabel('Password').fill(process.env.E2E_PASSWORD);
  await page.getByRole('button', { name: 'Sign in' }).click();
}

This is not about fancy architecture. It is about keeping the test body focused on the one behavior you care about.

Use web first assertions, not manual checks

Playwright's async assertions wait and retry until the condition is met. That is exactly what you want for ui that loads asynchronously.

Prefer:

code

await expect(page.getByText('Welcome')).toBeVisible();

Avoid patterns like isVisible() checks for core assertions. Those checks can be useful for branching logic, but they do not replace a real assertion.

How to scale with AI agents without junk tests

AI can help a ton, but only when you constrain it. If you let it freestyle, you will get tests that look correct and fail instantly.

Give AI a refactor job, not a blank page

The best use of AI here is transformation. You hand it Codegen output, and it rewrites it into your style.

This prompt works because it is specific and reviewable:

code

Refactor this Playwright Codegen output into a maintainable spec.
Constraints:
1) Prefer getByRole, getByLabel, getByTestId. Avoid nth, first, last.
2) Extract repeated steps into helpers or fixtures.
3) Use web first assertions. No hard sleeps.
4) Keep one behavior per test.
Return an updated spec file only.

If you want to see this approach used on a real project, the walkthrough Playwright Skill Claude Code shows the full loop of generating tests, running them, and fixing failures.

Use Playwright mcp for agent driven exploration

If you are experimenting with agent driven coverage, the Playwright mcp server lets mcp clients drive browsers using structured snapshots.

Use this when you want an agent to explore the UI and propose missing scenarios. Do not use it when you need pixel perfect assertions.

Minimal config example:

mcp.json

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Treat agent output like any other code. Review it, run it, refactor it.

Stream results to TestDino for fast feedback

Once you start generating more tests, your bottleneck is no longer writing. It is debugging and visibility. You want to answer questions like: what failed, is it flaky, did it fail last week, and where is the trace.

That is where TestDino is useful. Its official docs show two common flows:

Stream results in real time using the TestDino cli and the @testdino/playwright package
Upload reports and artifacts after a run using the getting started guide

A minimal local run looks like this:

terminal

npm install @testdino/playwright
export TESTDINO_TOKEN="your-api-token"
npx tdpw test

In CI, you usually run Playwright normally, then upload the report folder so your traces and screenshots are available to the team. The Trace Viewer guide explains what gets captured and how to upload it.

How to keep generated tests stable in CI

Auto generating tests is easy. Keeping them stable is the actual job.

Turn on traces, then debug like a pro

If a test fails in CI but passes locally, you need evidence.

Configure Playwright to record traces, ideally only on first retry to limit overhead:

playwright.config.ts

// playwright.config.ts
export default defineConfig({
  use: {
    trace: 'on-first-retry'
  }
});

When you have a trace zip, you can open it locally:

terminal

npx playwright show-trace path/to/trace.zip

If you want traces in a shared place instead of downloading artifacts, the TestDino trace viewer doc shows how to upload traces so the team can open them in the browser.

Related deep dive: Playwright debugging guide covers the exact workflow for tracing CI failures.

Control network and data to cut flake

A big chunk of flake is just uncontrolled dependencies.

If your test depends on an external api, you are letting the network decide your build result. Either seed stable data or mock the unstable edges.

Playwright request routing gives you deterministic responses:

code

await page.route('**/api/products', async route => {
  await route.fulfill({ json: { items: [] } });
});

For a practical walkthrough, Playwright network mocking shows how to intercept and mock api calls without creating fake confidence.

Use a short CI review checklist

Generated tests should pass a small checklist before they get merged. This prevents silent flake from creeping in.

Risk	What it looks like	Fix
Fragile selector	nth, first, deep css path	Replace with getByRole or getByTestId
Hidden async work	Click happens, page is still loading	Add a web first assertion for the next state
External dependency	Real api returns variable data	Mock, stub, or seed stable fixtures
Auth drift	Login UI changes and many tests die	Reuse storage state or api login
Visual diffs are hard to review	Baselines exist but nobody checks diffs	Follow TestDino visual testing

If your suite is getting slower as it grows, the benchmarks and patterns in Playwright slow tests are a solid next read.

Conclusion

Playwright Codegen gets you from zero to runnable test code fast. AI can turn that draft into production tests, but only if you enforce locator policy, readable structure, and real assertions.

If you adopt one habit from this blog, make it this: treat Codegen output as a draft. Refactor it, then harden it with traces, deterministic data, and controlled dependencies. When your team needs shared visibility, stream or upload runs using the TestDino cli so traces and artifacts are not trapped inside a single CI job.

FAQs

Does Playwright have built in AI Codegen?

Playwright has Codegen. The AI part is your workflow layer, usually an AI agent that refactors and expands the generated code under strict rules.

What is the fastest way to start with Codegen?

Record one happy path, copy into a spec file, then refactor locators and assertions before you commit.

Can Codegen generate tests for Python?

Yes. Playwright Codegen supports multiple languages, including Python. The recording is the same, the output language changes.

How do you skip login every run?

Save storage state once, then load it for future recordings and tests. This avoids fragile UI login steps.

What is the best way to debug flaky tests in CI?

Capture traces on retry, then use Trace Viewer to replay the failure. For shared debugging, upload artifacts so the whole team can open the trace, like the flow described in the TestDino trace viewer guide.

Savan Vaghani

Product Developer

Savan Vaghani is a Product Developer at TestDino with 2+ years of experience in frontend engineering and developer focused SaaS platforms. He specializes in React based UI architecture, TypeScript driven development, and building scalable dashboards for analytics heavy applications.

Savan has led the frontend architecture of TestDino, designing multi tenant project scoped interfaces that help QA and engineering teams understand Playwright test runs, flaky test patterns, and CI pipeline health at a glance. He currently focuses on improving product usability, onboarding flows, and GitHub integration interfaces, ensuring every feature feels intuitive from the first interaction.

Previously, Savan worked on frontend systems for internal analytics and reporting tools where he streamlined complex data views into developer friendly interfaces. He is experienced in modern frameworks such as React and Next.js, follows strict TypeScript practices, and contributes to product level design decisions that bridge engineering and user experience.

View all posts →

Table of content

Flaky tests killing your velocity?

TestDino auto-detects flakiness, categorizes root causes, tracks patterns over time.

See Your Flakiest Tests

Playwright AI Codegen: Generate Tests Automatically

What is Playwright AI Codegen?

What Codegen gives you in five minutes

Where recordings go wrong

When should you use Codegen vs hand written tests?

Quick rule of thumb

How to generate tests with Codegen step by step

Step 1 – Install and sanity check

Step 2 – Start Codegen from the inspector

Step 3 – Record one happy path only

Step 4 – Add at least one real assertion

Step 6 – Record in the same device profile you run in

Step 7 – Paste into a spec and run once

How to refactor recordings for code review

Lock a locator strategy and enforce it

Extract setup into helpers or fixtures

Use web first assertions, not manual checks

How to scale with AI agents without junk tests

Use Playwright mcp for agent driven exploration

Stream results to TestDino for fast feedback

How to keep generated tests stable in CI

Turn on traces, then debug like a pro

Control network and data to cut flake

Use a short CI review checklist

Conclusion

FAQs

Get started fast

Playwright Job Market in 2026

Puppeteer vs Playwright: Key Differences & Which to Use

Playwright Component Testing: Complete Setup and CI Guide

Playwright AI Codegen: Generate Tests Automatically

What is Playwright AI Codegen?

What Codegen gives you in five minutes

Where recordings go wrong

When should you use Codegen vs hand written tests?

Quick rule of thumb

How to generate tests with Codegen step by step

Step 1 – Install and sanity check

Step 2 – Start Codegen from the inspector

Step 3 – Record one happy path only

Step 4 – Add at least one real assertion

Step 5 – Save auth state so you skip login

Step 6 – Record in the same device profile you run in

Step 7 – Paste into a spec and run once

How to refactor recordings for code review

Lock a locator strategy and enforce it

Extract setup into helpers or fixtures

Use web first assertions, not manual checks

How to scale with AI agents without junk tests

Use Playwright mcp for agent driven exploration

Stream results to TestDino for fast feedback

How to keep generated tests stable in CI

Turn on traces, then debug like a pro

Control network and data to cut flake

Use a short CI review checklist

Conclusion

FAQs

Get started fast

Playwright Job Market in 2026

Puppeteer vs Playwright: Key Differences & Which to Use

Playwright Component Testing: Complete Setup and CI Guide

Join our waitlist