Playwright Automation Checklist to Reduce Flaky Tests

This checklist gives Playwright teams a clear path to reduce flaky tests fast, starting from code fixes to CI setup and team process. It focuses on removing noise, improving signal, and building stable, reliable test automation at every level.

User

Pratik Patel

Oct 28, 2025

Playwright Automation Checklist to Reduce Flaky Tests

Flaky tests can be a real headache. They’re not just a small annoyance; they can seriously slow down your development process and shake your confidence in deployments.

The key to a solid Playwright automation strategy is to cut through the noise and find the “signal.” You want clear, reliable feedback from your tests. This article gives you a straightforward, three-part checklist to get you there.

It’s a simple framework to build a strong Playwright automation system, starting with your code, moving to your pipeline, and finally, looking at your team’s processes.

Idea Icon What is Playwright?

The Playwright framework is a powerful tool for web automation and end-to-end testing, offering key features such as cross-browser support, robust automation capabilities, and flexible architecture.

Playwright supports multiple browsers, including Chromium, Firefox, and WebKit, enabling comprehensive testing across different environments.

Cut Playwright Flake with This Free Checklist

Use this step-by-step checklist to stabilize your Playwright tests, improve reliability, and cut flaky failures fast.

Get the Checklist

Why This Checklist Cuts Playwright Flakiness Fast

For teams needing to make an immediate impact, this checklist provides a summary of the most critical changes required to reduce flakiness. It contrasts common anti-patterns that create noise with the best practices that generate a clear signal.

By following this checklist, teams can enhance their test automation efforts and streamline testing processes, leading to more efficient and reliable results.

Category Common Anti-pattern (Noise) Best Practice ( Signal)
Waits Arbitrary fixed delays: page.waitForTimeout(5000) Actionability-based assertions: expect(locator).toBeVisible()
Selectors Brittle, implementation-tied locators: div > span:nth-child(3) User-facing, resilient locators: page.getByRole('button', { name: 'Submit' })
Data Shared state and hardcoded values across tests Isolated, idempotent data seeded or mocked for each test
CI/CD Ignoring failures, manual reruns, and minimal artifacts Configured auto-retries, rich artifact collection, and test quarantining for improved test execution
Triage Guessing the root cause from raw console logs Analyzing historical failure data and interactive trace files

Tier 1: Code-Level Resilience: Mastering Playwright’s Native Stability

A reliable test suite starts with stable code. So, first let’s get you comfortable with Playwright's core ideas and built-in features.

Instead of just learning a bunch of tricks, understand how the framework is designed to help you.

1. Stop Using waitForTimeout: Embrace Auto-Waiting and Actionability

One of the most common reasons for flaky tests is a simple mismatch in speed. Your test script might be moving faster than your web app can render.

A common but flawed solution is to add a fixed delay, like page.waitForTimeout(3000). This is a gamble. You either end up slowing down your tests for no reason or, worse, not waiting long enough when the app is slow, which just makes the test more unreliable.

Playwright has a much smarter way of handling this with its "auto-waiting" feature. When you tell it to do something like locator.click(), it doesn't just jump in. It first runs a series of checks to make sure the element is "actionable." It waits until the element is:

  • Attached to the DOM
  • Visible on the page
  • Stable (not animating)
  • Enabled (not disabled)
  • Receives Events (not covered by another element)

Playwright leverages the real browser input pipeline and its auto-wait feature to eliminate flaky tests. By simulating real user interactions through the real browser input pipeline and automatically waiting for elements to be ready, Playwright ensures test actions are reliable and reduces flaky tests caused by timing issues.

This built-in intelligence means you almost never need to add your own waits. Just trust the framework.

Example: Removing an Anti-Pattern

  • Before (Noisy and Unreliable):
TypeScript
await page.locator('#submit-button').click(); // Hope the spinner disappears in time await page.waitForTimeout(2000); await expect(page.locator('#success-message')).toBeVisible();
  • After (High-Signal and Resilient):
TypeScript
await page.locator('#submit-button').click(); // Playwright's assertion will auto-wait for the message await expect(page.locator('#success-message')).toBeVisible();

2. Ditch Brittle Selectors: A Locator Strategy for Stable Tests

The second major source of flakiness is an unstable locator strategy. Tests that rely on brittle selectors, such as auto-generated CSS classes or fragile XPath expressions, break whenever developers refactor the UI.

A resilient strategy focuses on attributes that are meaningful to the end-user. Playwright’s locators encourage this practice by prioritizing user-facing selectors.

Playwright selectors are powerful tools that enable robust element targeting, even within complex scenarios like shadow DOMs or dynamic controls, which is essential for building a stable Playwright test case that can be reliably automated and maintained.

This hierarchy decouples the test from implementation details, making it far more stable.

If the element is... Use this Locator First Example Fallback Option
An interactive control (button, link, input) page.getByRole(...) getByRole('button', { name: 'Login' }) page.getByTestId(...)
A non-interactive element (div, span, p) page.getByText(...) getByText('Welcome back, user!') page.getByTestId(...)
A form field associated with a visible label page.getByLabel(...) getByLabel('Password') page.getByTestId(...)
An element with no user-visible attributes page.getByTestId(...) getByTestId('main-content-wrapper') (Last resort) CSS/XPath

3. Use Retrying Assertions for Dynamic Content

Modern web applications are highly dynamic. A test might fail because it checks for an element's text before an API call has returned and populated it. Playwright's "web-first assertions" handle this scenario.

When you use an assertion like expect(locator), it doesn't just check once. It keeps retrying until the assertion passes or it times out (usually after 5 seconds). This is a great way to build stable tests for dynamic UIs.

Configuring a test retry strategy in Playwright further improves test stability by automatically re-running failed tests, helping to identify and diagnose test failures in dynamic environments and reducing flakiness.

Example: Handling Asynchronous Text Updates

TypeScript
// This assertion will automatically wait and retry for up to 5 seconds // for the text to change, preventing a common race condition. await expect(page.locator('#order-status')).toHaveText('Complete!');

Tier 2: Pipeline-Level Resilience: Configuring CI/CD for Signal

How you configure your CI says a lot about your team's testing maturity. Set CI to surface signal fast: controlled retries, rich artifacts, clean isolation

1. The Smart Retry Strategy: When and How to Rerun Failed Tests

Automatic retries are a practical way to handle those random, one-off failures, like a quick network hiccup. While it's true that retries can sometimes hide bigger problems, a controlled retry strategy is a standard industry practice.

  • A good balance is to set retries: 2 in your playwright.config.ts file, specifically for your CI environment. This lets a failing test run up to three times.
  • When you enable retries, Playwright gives you more detailed results.
  • It will tell you if a test passed (on the first try), failed (on all tries), skipped, or was flaky (failed at first but then passed). That flaky status is a huge signal that a test is unreliable and needs a closer look, even if it didn't fail the build.
  • To further improve reliability, you can configure the test retry strategy in Playwright to automatically capture execution trace, videos, and screenshots on test retries.

Capturing an execution trace provides detailed logs of each test run, making it easier to analyze and debug flaky tests.

2. Collect the Right Artifacts: Traces, Videos, and Screenshots on Failure

When a test fails in CI, console output is often not enough for debugging. It is essential to collect rich diagnostic artifacts to find the root cause. Playwright offers three key types:

  • Screenshots: A snapshot of the page right when the failure happened.
  • Videos: A recording of the entire test.
  • Traces: The most detailed data you can get, including a DOM snapshot for every action, console logs, network requests, and a step-by-step timeline.

In addition to these, capturing execution logs and a test execution screencast provides comprehensive insight into test runs.

The ability to explore execution logs helps diagnose issues more effectively by revealing detailed step-by-step actions and test behavior.

Keep in mind, the best strategy is to capture these only when needed to avoid performance overhead.

CI Configuration:

TypeScript
// in playwright.config.ts use: { screenshot: 'only-on-failure', video: 'retain-on-failure', trace: 'on-first-retry', }

3. Isolate Everything: Data, State, and Parallel Workers

Here's something that trips up teams: tests stepping on each other's toes. You run your tests individually and they're perfect. Run them together? Chaos. Let me walk you through how to prevent this mess.

1. Test Isolation:

Think of each test like it requires its own sandbox. No test should care what happened before it ran.

Playwright achieves full test isolation by creating a new browser context for each test, ensuring independent tests with isolated browser contexts that function as a brand new browser profile.

This approach allows each test to run in its own browser context, providing complete separation and preventing shared state between tests.

I like using the beforeEach hook for setup - say, you need to log in a test user or reset some settings. Whatever it is, do it fresh for each test. Trust me, future you will thank present you when debugging.

2. Data Isolation:

Here's the deal - if your UI tests are hitting a real database, you're asking for trouble.

For most UI testing, I recommend mocking your API calls using page.route(). Why? Because then you control everything. No surprise data changes, no backend hiccups messing with your frontend tests.

Now, if you absolutely need that full end-to-end experience with real data, here's what works: set up your data fresh before each test run, then clean up after yourself. Every. Single. Time. Yeah, it's extra work, but it beats debugging random failures at 3 AM.

3. Environment Isolation:

You know that classic developer excuse - "but it works on my machine!" Yeah, we need to kill that. Playwright has this great Docker image that basically guarantees your tests run identically everywhere. Use it.

For parallel execution: start conservative. Set the workers: 1 in your CI at first. Get everything stable. Then gradually bump it up. Going from 1 to 4 workers might cut your test time in half, but if it introduces flakiness, was it worth it? (Spoiler: it wasn't.)

Tier 3: Process-Level Resilience: From Triage to Long-Term Fixes

You can write perfect test code and have the best CI setup, but if your team doesn't have a game plan for dealing with test failures, you're still going to struggle.

I've seen too many teams where flaky tests become background noise. People start ignoring failures. The test suite becomes that check engine light everyone pretends isn't on. Don't let this happen to you.

Here's how to build a process that actually works.

1. Which Steps Add Signal Without Slowing the Pipeline?

Your team needs to ship features. Period. So how do you deal with flaky tests without grinding everything to a halt?

Quarantining:

Found a flaky test that's driving everyone nuts? Don't just disable it and forget about it. Put it in timeout instead.

Tag it with something like @flaky and exclude it from your main pipeline.

But here's the key - set up a separate CI job that still runs these quarantined tests. They won't block deployments, but you'll still see if they're getting worse or (hopefully) better. It's like putting them on probation.

Note that even when tests are quarantined, you can continue to run tests and keep running tests as part of your Playwright test automation workflow.

This ensures that your automation process remains robust and you can monitor flaky cases without interrupting the main pipeline.

Ownership:

This one's simple but often overlooked. Every problematic test needs a name attached to it. Not "the team" - an actual person. When everyone's responsible, nobody's responsible.

Ticket Linking:

Found a flaky test? Create a ticket. Right then. Not tomorrow, not after the sprint planning. Now.

I know it feels like busywork, but trust me - those tickets are how you prevent the same conversation six months later: "Wait, hasn't this test been flaky forever?" Yes, yes, it has. And now you have the receipts.

2. Beyond the Single Report: Analyzing Flake History and Trends

Your Playwright HTML report shows you what just broke. Cool. But here's what it can't tell you:

Has this test been secretly failing for the past month? Are we getting better or worse at writing stable tests? Which features always seem to have test problems? What's our most problematic test that we should fix yesterday?

These aren't optional questions. They mark the difference between chasing random test failures and making real, lasting improvements to your test suite.

This Is Where TestDino Helps

What makes the difference is having all your test runs in one place, building up a history you can actually learn from. You want to see patterns which tests fail together, which ones fail on Mondays (yes, that's a thing), which ones are getting progressively slower.

With proper analytics, you stop reacting to individual fires and start preventing them. You can look at a dashboard and immediately know: "Our checkout flow tests are 3x flakier than everything else - let's focus there."

Tracking Playwright tests, monitoring individual Playwright test cases, and analyzing the test file history help you identify trends and improve test reliability. For teams serious about understanding their test health, tracking the right metrics makes all the difference.

Conclusion

Here's the thing about flaky tests - they're not some unsolvable mystery. They're just a puzzle that needs the right approach.

You start with your code (using Playwright's built-in features properly), move to your pipeline (setting up CI to catch and diagnose failures), and wrap it up with good team processes (so problems actually get fixed, not ignored).

Think of Playwright automation as more than a testing tool - it's your early warning system. But like any alarm system, it only works if people trust it won't cry wolf.

Want to see what your test suite is really telling you? Grab a free TestDino trial and find out which tests are actually causing you problems - not just today, but over time.

FAQs

Quarantine a test when it’s intermittently failing, blocking your main CI/CD pipeline, and the root cause isn’t immediately obvious.

By moving it to a separate “quarantine” run, you unblock deployments while creating a ticket to investigate the flake without pressure. For more strategies, see our guide on handling flaky tests in CI.

Get started fast

Step-by-step guides, real-world examples, and proven strategies to maximize your test reporting success