Playwright automation checklist to reduce flaky tests

This checklist gives Playwright teams a clear path to reduce flaky tests fast, starting from code fixes to CI setup and team process. It focuses on removing noise, improving signal, and building stable, reliable test automation at every level.

Playwright automation checklist to reduce flaky tests

Flaky tests can be a real headache. They’re not just a small annoyance; they can seriously slow down your development process and shake your confidence in deployments.

The key to a solid Playwright automation strategy is to cut through the noise and find the “signal.” You want clear, reliable feedback from your tests. This article gives you a straightforward, three-part checklist to get you there.

It’s a simple framework to build a strong Playwright automation system, starting with your code, moving to your pipeline, and finally, looking at your team’s processes.

What is Playwright?

The Playwright framework is a powerful tool for web automation and end-to-end testing, offering key features such as cross-browser support, robust automation capabilities, and flexible architecture.

Playwright supports multiple browsers, including Chromium, Firefox, and WebKit, enabling comprehensive testing across different environments.

Quick list, at a glance

What you will get:

  • Code fixes that cut flakiness and raise the test signal
  • CI settings that surface the root cause fast
  • A simple process that keeps noise out

Code-level

  • Replace waitForTimeout(...) with auto-waits and web-first assertions
  • Prefer user-facing locators: getByRole, getByLabel, getByText
  • Avoid brittle CSS and XPath like div > span:nth-child(3)
  • Make each test independent, new browser context per test
  • Use retrying assertions: toBeVisible, toHaveText, toHaveURL
  • Mock network when full backend is not needed with page.route(...)

CI/CD-level

  • Set retries: 2 in CI only
  • Capture trace: on-first-retry, screenshots and video on failure
  • Start with workers: 1, raise slowly after the suite is stable
  • Use Docker or pinned browsers to avoid env drift
  • Keep artifacts for 7–14 days for fast triage

Process-level

  • Quarantine flaky tests with @flaky, run them in a separate nightly job
  • Assign an owner for each flaky test and open a ticket the same day
  • Track flake rate, mean time to triage, and slowest tests
  • Review trends weekly and delete dead tests
  • Block merges on critical failures, low pass rate, or big coverage drop

Targets to aim for:

  • Flake rate under 5%
  • Mean time to triage under 10 min
  • p95 slowest test under the agreed limit for your app

Do this now:

  • Set retries: 2, trace: on-first-retry, screenshot: only-on-failure, video: retain-on-failure
  • Replace fixed waits, switch to getByRole where possible
  • Add @flaky tag and a nightly quarantine run

Which Points Cuts Playwright Flake the Fastest?

For teams looking to make an immediate impact, this checklist summarizes the most critical changes required to reduce flakiness. It contrasts common anti-patterns that create noise with the best practices that generate a clear signal.

By following this checklist, teams can enhance test automation and streamline testing processes, resulting in more efficient, reliable testing.

Category Common Anti-Pattern (Noise) Best Practice (Signal)
Waits Arbitrary fixed delays: page.waitForTimeout(5000) Actionability-based assertions: expect(locator).toBeVisible()
Selectors Brittle, implementation-tied locators: div > span:nth-child(3) User-facing, resilient locators: page.getByRole('button', { name: 'Submit' })
Data Shared state and hardcoded values across tests Isolated, idempotent data seeded or mocked for each test
CI/CD Ignoring failures, manual reruns, and minimal artifacts Configured auto-retries, rich artifact collection, and test quarantining for improved test execution
Triage Guessing the root cause from raw console logs Analyzing historical failure data and interactive trace files

Tier 1: Code-Level Resilience: Mastering Playwright’s Native Stability

A reliable test suite starts with stable code. So, first let’s get you comfortable with Playwright's core ideas and built-in features. 

Instead of just learning a bunch of tricks, understand how the framework is designed to help you.

1. Stop Using waitForTimeout: Embrace Auto-Waiting and Actionability

One of the most common reasons for flaky tests is a simple speed mismatch. Your test script might be moving faster than your web app can render. 

A common but flawed solution is to add a fixed delay, like page.waitForTimeout(3000). This is a gamble. You either end up slowing down your tests for no reason or, worse, not waiting long enough when the app is slow, which just makes the test more unreliable.

Playwright handles this much more intelligently with its "auto-waiting" feature. When you tell it to do something like locator.click(), it doesn't just jump in. It first runs a series of checks to make sure the element is "actionable." It waits until the element is:

  • Attached to the DOM
  • Visible on the page
  • Stable (not animating)
  • Enabled (not disabled)
  • Receives Events (not covered by another element)

Playwright leverages the real browser input pipeline and its auto-wait feature to eliminate flaky tests. By simulating real user interactions through the real browser input pipeline and automatically waiting for elements to be ready, Playwright ensures test actions are reliable and reduces flaky tests caused by timing issues.

This built-in intelligence means you almost never need to add your own waits. Just trust the framework.

Example: Removing an Anti-Pattern

  • Before (Noisy and Unreliable):
    TypeScript
    await page.locator('#submit-button').click();
    // Hope the spinner disappears in time
    await page.waitForTimeout(2000);
    expect(await page.locator('#success-message').isVisible()).toBe(true);
    
  • After (High-Signal and Resilient):
    TypeScript
    await page.locator('#submit-button').click();
    // Playwright's assertion will auto-wait for the message
    await expect(page.locator('#success-message')).toBeVisible();
    

2. Ditch Brittle Selectors: A Locator Strategy for Stable Tests

The second major source of flakiness is an unstable locator strategy. Tests that rely on brittle selectors, such as auto-generated CSS classes or fragile XPath expressions, break whenever developers refactor the UI.

A resilient strategy focuses on attributes meaningful to the end-user. Playwright’s locators encourage this practice by prioritizing user-facing selectors. 

Playwright selectors are powerful tools that enable robust element targeting, even in complex scenarios like shadow DOMs or dynamic controls, which is essential for building a stable Playwright test case that can be reliably automated and maintained. 

This hierarchy decouples the test from implementation details, making it far more stable.

If the element is… Use this Locator First Example Fallback Option
An interactive control (button, link, input) page.getByRole(...) getByRole('button', { name: 'Login' }) page.getByTestId(...)
A non-interactive element (div, span, p) page.getByText(...) getByText('Welcome back, user!') page.getByTestId(...)
A form field associated with a visible label page.getByLabel(...) getByLabel('Password') page.getByTestId(...)
An element with no user-visible attributes page.getByTestId(...) getByTestId('main-content-wrapper') (Last resort) CSS/XPath

3. Use Retrying Assertions for Dynamic Content

Modern web applications are highly dynamic. A test might fail because it checks for an element's text before an API call has returned and populated it. Playwright's "web-first assertions" handle this scenario.

When you use an assertion like expect(locator), it doesn't just check once. It keeps retrying until the assertion passes or it times out (usually after 5 seconds). This is a great way to build stable tests for dynamic UIs.

Configuring a test retry strategy in Playwright further improves test stability by automatically rerunning failed tests, helping identify and diagnose failures in dynamic environments and reducing flakiness.

Example: Handling Asynchronous Text Updates

TypeScript
// This assertion will automatically wait and retry for up to 5 seconds
// for the text to change, preventing a common race condition.
await expect(page.locator('#order-status')).toHaveText('Complete!');

Tier 2: Pipeline-Level Resilience: Configuring CI/CD for Signal

How you configure your CI says a lot about your team's testing maturity. Set CI to surface signal fast: controlled retries, rich artifacts, clean isolation

1. The Smart Retry Strategy: When and How to Rerun Failed Tests

Automatic retries are a practical way to handle those random, one-off failures, like a quick network hiccup. While it's true that retries can sometimes hide bigger problems, a controlled retry strategy is a standard industry practice.

  • A good balance is to set retries: 2 in your playwright.config.ts file, specifically for your CI environment. This lets a failing test run up to three times.  
  • When you enable retries, Playwright gives you more detailed results. 
  • It will tell you if a test passed (on the first try), failed (on all tries), skipped, or was flaky (failed at first but then passed). That flaky status is a huge signal that a test is unreliable and needs a closer look, even if it didn't fail the build.
  • To further improve reliability, you can configure the test retry strategy in Playwright to automatically capture execution trace, videos, and screenshots on test retries. 

Capturing an execution trace provides detailed logs of each test run, making it easier to analyze and debug flaky tests.

2. Collect the Right Artifacts: Traces, Videos, and Screenshots on Failure

When a test fails in CI, console output is often not enough for debugging. It is essential to collect rich diagnostic artifacts to find the root cause. Playwright offers three key types:

  • Screenshots: A snapshot of the page right when the failure happened.
  • Videos: A recording of the entire test.
  • Traces: The most detailed data you can get, including a DOM snapshot for every action, console logs, network requests, and a step-by-step timeline.

In addition to these, capturing execution logs and a test execution screencast provides comprehensive insight into test runs. 

The ability to explore execution logs helps diagnose issues more effectively by revealing detailed step-by-step actions and test behavior.

Keep in mind, the best strategy is to capture these only when needed to avoid performance overhead.

CI Configuration:

TypeScript
// in playwright.config.ts
use: {
  screenshot: 'only-on-failure',
  video: 'retain-on-failure',
  trace: 'on-first-retry',
}

3. Isolate Everything: Data, State, and Parallel Workers

Here's something that trips up teams: tests stepping on each other's toes. You run your tests individually and they're perfect. Run them together? Chaos. Let me walk you through how to prevent this mess.

1. Test Isolation

Think of each test like it requires its own sandbox. No test should care what happened before it ran.

Playwright achieves full test isolation by creating a new browser context for each test, ensuring independent tests with isolated browser contexts that function as a brand new browser profile. 

This approach allows each test to run in its own browser context, providing complete separation and preventing shared state between tests.

I like using the beforeEach hook for setup - say, you need to log in a test user or reset some settings. Whatever it is, do it fresh for each test. Trust me, future you will thank present you when debugging.

2. Data Isolation

Here's the deal - if your UI tests hit a real database, you're asking for trouble.

For most UI testing, I recommend mocking your API calls using page.route(). Why? Because then you control everything. No surprise data changes, no backend hiccups messing with your frontend tests.

Now, if you absolutely need that full end-to-end experience with real data, here's what works: set up your data fresh before each test run, then clean up after yourself. Every. Single. Time. Yeah, it's extra work, but it beats debugging random failures at 3 AM.

3. Environment Isolation

You know that classic developer excuse - "but it works on my machine!" Yeah, we need to kill that. Playwright has this great Docker image that basically guarantees your tests run identically everywhere. Use it.

For parallel execution: start conservative. Set the workers: 1 in your CI at first. Get everything stable. Then gradually bump it up. Going from 1 to 4 workers might cut your test time in half, but if it introduces flakiness, was it worth it? (Spoiler: it wasn't.)

Tier 3: Process-Level Resilience: From Triage to Long-Term Fixes

You can write perfect test code and have the best CI setup, but if your team doesn't have a game plan for dealing with test failures, you're still going to struggle.

I've seen too many teams where flaky tests become background noise. People start ignoring failures. The test suite becomes that check engine light everyone pretends isn't on. Don't let this happen to you.

Here's how to build a process that actually works.

1. Which Steps Add Signal Without Slowing the Pipeline?

Your team needs to ship features. Period. So how do you deal with flaky tests without grinding everything to a halt?

Quarantining

Found a flaky test that's driving everyone nuts? Don't just disable it and forget about it. Put it in a timeout instead.

Tag it with something like @flaky and exclude it from your main pipeline. 

But here's the key - set up a separate CI job that still runs these quarantined tests. They won't block deployments, but you'll still see if they're getting worse or (hopefully) better. It's like putting them on probation.

Note that even when tests are quarantined, you can continue to run tests and keep running tests as part of your Playwright test automation workflow. 

This ensures that your automation process remains robust and you can monitor flaky cases without interrupting the main pipeline.

Ownership

This one's simple but often overlooked. Every problematic test needs a name attached to it. Not "the team" - an actual person. When everyone's responsible, nobody's responsible.

Ticket Linking

Found a flaky test? Create a ticket. Right then. Not tomorrow, not after the sprint planning. Now.

I know it feels like busywork, but trust me, those tickets are how you prevent the same conversation six months later: "Wait, hasn't this test been flaky forever?" Yes, yes, it has. And now you have the receipts.

2. Beyond the Single Report: Analyzing Flake History and Trends

Your Playwright HTML report shows you what just broke. Cool. But here's what it can't tell you:

Has this test been secretly failing for the past month? Are we getting better or worse at writing stable tests? Which features always seem to have test problems? What's our most problematic test that we should fix yesterday?

These aren't optional questions. They mark the difference between chasing random test failures and making real, lasting improvements to your test suite.

This Is Where TestDino Helps
  • Centralizes all Playwright runs, test cases, and file history.
  • Gives you dashboards that show where test health is degrading so you can prioritize fixes.
  • Cuts triage time by classifying failures and flaky tests so you focus on real bugs instead of noise.
  • Lets you rerun only failed tests in any CI system, which shortens feedback loops and saves pipeline time.
  • Connects test health to branches, environments, and pull requests so you see the impact of each change.
  • Helps serious teams track the right metrics and steadily improve reliability.

Conclusion

Here's the thing about flaky tests 👉 they're not some unsolvable mystery. They're just puzzles that need the right approach.

You start with your code (using Playwright's built-in features properly), move to your pipeline (setting up CI to catch and diagnose failures), and wrap it up with good team processes (so problems actually get fixed, not ignored).

Think of Playwright automation as more than a testing tool; it's your early warning system. But like any alarm system, it only works if people trust it won't cry wolf.

Want to see what your test suite is really telling you? Contact us to get TestDino’s trial and find out which tests are actually causing you problems - not just today, but over time.

FAQs

    1. When should I quarantine a failing test instead of fixing it immediately?

    Quarantine a test when it’s intermittently failing, blocking your main CI/CD pipeline, and the root cause isn’t immediately obvious.

    By moving it to a separate “quarantine” run, you unblock deployments while creating a ticket to investigate the flake without pressure. For more strategies, see our guide on handling flaky tests in CI.

    2. How do I run my first Playwright test case?

    To run your first Playwright test, create a test file (e.g., example.spec.ts) and write a simple test scenario. You can use Visual Studio Code for editing and running your tests, as it provides great integration with Playwright.

    Use the Playwright Inspector to debug and step through your first Playwright test case visually. Run your test with npx playwright test.

    3. Does Playwright support API testing, and how do I perform API testing?

    Yes, Playwright supports API testing. You can perform API testing by using Playwright's APIRequestContext to send HTTP requests and validate responses directly within your test scripts.

    This allows you to perform API testing alongside browser automation, making it easy to test REST APIs and web UI in a single workflow.

    4. How do I create complex test scenarios with Playwright?

    You can create scenarios that span multiple tabs, involve multiple users, and interact with test frames.

    Playwright allows you to create scenarios that simulate real-world interactions, such as switching between tabs, handling multiple user sessions, and navigating frames or shadow DOM elements for comprehensive test coverage.

    5. What browsers does Playwright support, and how does it mimic real user interactions?

    Playwright supports these browsers: Chromium, Firefox, and WebKit. Playwright supports running tests across these browsers, ensuring cross-browser compatibility.

    It mimics real user interactions by using the browser's native input pipeline, producing events indistinguishable from those generated by a real user.

    Pratik Patel

    Founder & CEO

    Pratik Patel is the founder of TestDino, a Playwright-focused observability and CI optimization platform that helps engineering and QA teams gain clear visibility into automated test results, flaky failures, and CI pipeline health. With 12+ years of QA automation experience, he has worked closely with startups and enterprise organizations to build and scale high-performing QA teams, including companies such as Scotts Miracle-Gro, Avenue One, and Huma.

    Pratik is an active contributor to the open-source community and a member of the Test Tribe community. He previously authored Make the Move to Automation with Appium and supported lot of QA engineers with practical tools, consulting, and educational resources, and he regularly writes about modern testing practices, Playwright, and developer productivity.

    Get started fast

    Step-by-step guides, real-world examples, and proven strategies to maximize your test reporting success