How is the flaky rate calculated?

The flaky rate is the percentage of recent runs where a test showed flaky behavior, meaning it failed at least once but eventually passed, or produced inconsistent results across runs without code changes.

Can I set alerts for new flaky tests?

Yes. TestDino can notify your Slack channel when a previously stable test starts showing flaky behavior. You can also set alerts when your overall suite flaky rate exceeds a threshold you define.

Does TestDino work with Playwright's built-in retry configuration?

Yes. TestDino reads Playwright's retry data directly. If you have retries configured, TestDino analyzes all attempts to determine whether the test is flaky. You don't need to change your retry settings.

How quickly does TestDino detect a new flaky test?

A test can be flagged as flaky from a single run if it fails then passes on retry. For cross-run detection, TestDino typically identifies a pattern within 3-5 runs. The more runs you have, the more confident the classification becomes.

Flaky Tests

Stop guessing which tests are flaky.

Q: What root causes does TestDino classify?

TestDino classifies flaky tests into five categories: Timing Related, Environment Dependent, Network Dependent, Assertion Intermittent, and Other. Classification is based on failure patterns, error messages, and timing data across retry attempts.

TestDino detects flaky Playwright tests automatically through retry analysis and cross-run patterns, then tracks stability so you fix what matters first.

TestDino flaky test detection and tracking dashboard

Playwright-native reporting

AI-powered test analysis

Get started

Trusted by teams at

Case Study

Flaky tests are eating your
pipeline time and team trust

Half the team ignores red builds and nobody knows which tests are actually unreliable.

Nobody knows which tests are actually flaky

Some tests fail intermittently but there is no definitive list. Someone added a test.skip annotation six months ago with a TODO that never got addressed. The suite has a trust problem with no data.

CI reruns are your unofficial flaky test strategy

Your pipeline retries failed tests two or three times. If it passes on retry, everyone moves on. Those retries cost CI minutes, slow the pipeline, and mask real failures.

Real failures get lost in flaky noise

A genuine regression fails a test, but the team assumes it's flaky because that test has failed before. The PR gets merged. The bug makes it to production.

No visibility into whether flaky tests improve

You fixed a flaky test last week. Is it still stable? There is no trend line, no stability score, no way to confirm your fix actually worked beyond hoping and manually watching CI.

How flaky test detection works

TestDino analyzes every test run for retry patterns, cross-run inconsistencies, and failure-to-pass transitions. Flaky detection starts from your very first run.

Add the TestDino reporter

One line in your Playwright config. Test results, retry data, and timing information flow to TestDino after every run. No wrappers, no new dependencies.

View setup guide

playwright.config.ts

reporter: [
  ['html', { outputDir: './playwright-report' }],
  ['json', { outputFile: './playwright-report/report.json' }],
]

Upload after tests run

npx tdpw upload <report-dir> --token=$TESTDINO_API_KEY

Retry patterns are analyzed

When a test fails then passes on retry, TestDino flags it as flaky. Across multiple runs, TestDino builds a flaky rate for every test - the percentage of runs where it exhibited flaky behavior.

Review flaky tests by root cause

TestDino classifies likely root causes into five categories: timing related, environment dependent, network dependent, assertion intermittent, and other. See which tests are most flaky and what category they fall into.

Let your AI agent fix flaky tests

Connect TestDino's MCP server to Cursor, Claude Code, or Copilot. Your AI agent finds the flakiest tests, looks at their failure history, and suggests targeted fixes without you switching to the dashboard.

Auto-detect flaky tests across runs.

Without TestDino

With TestDino

Some tests keep flipping.

5 Runs . Same Code

main branch · no code changes

TestRun 1Run 2Run 3Run 4Run 5

✓

Apply discount

✓

Submit order

✓

Card payment

✓

Search filter

✓

Profile update

✓

Mixed rows = flaky. But which ones? And why?

87min reviewing

With TestDino

Flaky tests detected with root cause

TestDino

Auto-Detect

Flaky Detection · Automatic

Flaky Found

45%

Worst Rate

82%

Suite Stable

Classified by root cause

Card payment45%Timing

Apply discount38%Network

Search filter22%Assertion

3flaky detected82%suite stable

Teams love what we built

See why developers choose TestDino to ship faster and debug smarter

Over 30 flaky tests and no structured way to track them, just CI artifacts and morning guesswork. TestDino's "Most Flaky Tests" feature broke this pattern. We can see failure trends now and pull up video recordings of exactly what went wrong. The TestDino MCP server is the magic piece on top, I ask my Claude agent about a failure and it pulls full context from TestDino without switching tabs. We went from 30-something flaky tests down to 3 or 4.

70%

Fewer flaky test reruns

Faster failure triage

Shrinath Rao

Lead Quality Assurance Engineer

Migrating to TestDino from Currents was an easy decision. The features are stronger, the cost is lower, and the interface makes debugging far less painful. Flaky test detection and AI failure classification have simplified debugging and reduced our CI costs by cutting down reruns and noisy failures.

40%

Reduction in CI costs

90%

Less time triaging failures

Johan Frølich

CTO & Co-founder

Automated detection vs
manual flaky tracking

Automatic detection from retry data

Flaky tests identified automatically from retry data and cross-run patterns. No manual annotations needed.

Root cause classification

Each flaky test is classified into one of five categories: Timing Related, Environment Dependent, Network Dependent, Assertion Intermittent, or Other. A starting point for fixes.

Flaky rate tracking over time

After fixing a flaky test, track its stability across subsequent runs to confirm the fix held.

Prioritized flaky list by impact

Flaky tests ranked by frequency, wasted CI minutes, and pipeline blocks. Worst offenders surface first.

Cross-run pattern analysis

Tests that pass within a run but fail across runs are caught, covering both within-run and cross-run inconsistencies.

Team-wide flaky visibility

One shared dashboard so everyone sees the same flaky test data. New team members know what is unreliable from day one.

What you get with flaky detection

A complete view of every unreliable test with root cause classifications and stability trends.

Root cause classification and failure grouping

Flaky tests are grouped by likely root cause. Timing-related flakes are separated from data dependency issues, which are separated from environment instability. This helps you batch similar fixes together instead of chasing individual failures.

Automatic flaky test identification

Every test run is analyzed for retry patterns and cross-run inconsistencies. Tests that fail then pass on retry are flagged. Tests with different results across runs without code changes are flagged too. You get a complete, always-current list.

Stability trend tracking with flaky rate history

Every flaky test has a trend line showing its flaky rate over time. See whether a test is getting more unstable or stabilizing after a fix. Turn flaky management from reactive firefighting into measurable improvement.

Works with your favourite tools

Connect seamlessly with Jira, Slack, GitHub, Linear, Azure DevOps, Asana, and monday to keep your workflow smooth and your team aligned.

TestDino MCP

Query test results, debug failures, and manage test cases from your IDE via MCP.

GitHub

Get AI-generated test summaries on pull requests and CI status checks on every commit.

Jira

Create Jira issues from test failures with full error context, screenshots, and links.

Slack

Route test alerts by environment and notify specific users when annotated tests fail.

Linear

File structured bug reports from test failures with failure history and code context.

GitLab

Post AI-generated test summaries on merge requests and sync MR state with TestDino.

Azure DevOps

View Playwright test runs, failure trends, and flaky tests inside Azure DevOps.

Asana

Create Asana tasks from test failures with error details, test steps, and screenshots.

monday

Create monday.com items from test failures and view test run data in dashboards.

See all integrations

FAQs

TestDino uses two detection methods. Within-run retry analysis: if a test fails on one attempt and passes on a subsequent retry, it is flagged as flaky. Cross-run pattern analysis: if a test produces different results across runs without code changes, it is identified as flaky. Both work automatically from your first run.