March 2026
What we measure when we measure QA
QA teams are routinely measured against the wrong metrics. The default dashboard for most QA organizations counts activity. The columns are test cases written, tickets touched, and bugs logged. None of these numbers describe quality. They describe how busy a team is, and busyness is not the goal.
The framework I run my team against is built around two metrics that describe outcome rather than activity: catch rate and escape rate.
Catch rate is the percentage of defects caught in QA before work reaches production. Escape rate is the percentage that reached production despite QA review. The two numbers should be inversely correlated, and over time they should trend toward higher catch and lower escape. When they do not, the dashboard has caught a problem that activity metrics would have hidden: a team that is touching plenty of tickets without finding what matters.
These two metrics force a conversation that activity metrics let leadership avoid. If escape rate is rising while ticket throughput holds steady, the team is busy but not effective, and the response is usually a coverage or process issue rather than a staffing one. If catch rate is rising and escape rate is falling, the team is improving regardless of what the throughput numbers say. The metrics also let QA leadership argue for investment in test infrastructure with a defensible return rather than a vague appeal to coverage.
The third metric I track is first-time-right, but I track it deliberately as a developer submission health metric rather than a QA performance metric. First-time-right measures the percentage of tickets that pass QA on the first review without kickback. A low FTR means developers are submitting work that is not ready for QA, which costs QA time and lengthens cycle time. A high FTR means submissions are clean. The mistake most QA dashboards make is treating FTR as something QA can move directly. QA does not control what arrives in the queue. QA only controls how it responds. Reporting FTR as a QA performance metric punishes the team for upstream behavior, and the only way to game it is to lower the bar on what counts as a kickback, which damages quality.
The fourth thing I track is the dollar value of rework. Every ticket that gets kicked back consumes both QA time and developer time, and a defensible loaded rate produces a number worth showing to executives. When I ran the analysis on our most recent state rollout, rework on a single product owner's submissions accounted for an order of magnitude more cost than any other source of friction in the pipeline. That number changed the conversation. Without the dollar translation the same data read as a process complaint. With it, the same data became a budget item.
The framework produces results that are visible at the team level and at the portfolio level. Over the most recent measurement period my team compressed the QA review cycle from seven workdays to two across our non-standard auto portfolio. Coverage expanded over the same period, which inverts the tradeoff people usually expect when cycle time falls. Catch rate held while escape rate dropped. The compression came from measuring the right things and then restructuring the work around what the measurements showed.
Each of the activity metric distortions follows a predictable pattern. When the metric is tickets touched, reviews shallow out. When the metric is bug count, defects get fragmented into smaller pieces to inflate the number. Library size as a metric produces redundant test cases. And FTR treated as a QA performance number drives the team to redefine what counts as a kickback, because that is the only available lever. These failure modes are real and recurring, and they are invisible on activity dashboards.
The final argument for outcome metrics is that they survive contact with executive leadership. A CFO does not have time to interpret ticket throughput. A CFO will read catch rate, escape rate, and rework cost in dollars without needing translation. The dashboards QA builds for executives are the same dashboards QA should be running against day to day.