From Code to Customer: Measuring Software Quality Before Release

“When a metric becomes a target, it ceases to be a good metric.” - economist Charles Goodhart

I feel that every discussion about metrics should mandatorily start with the above word of caution from Goodhart’s law. Metrics should inform decisions, not drive behavior in isolation. Without context, they can easily be gamed or give a false sense of confidence.

Alright! With that disclaimer out of the way, lets talk about Quality Metrics for production readiness.

What you’ll find here comes from the trenches — lessons from things that worked, things that didn’t, ideas that sounded smart but fell apart at scale, and blind spots I didn’t see until they hit production.

I’ve owned software quality across industries like e-commerce, fintech, streaming, and SaaS — in startups, scaleups, and big tech. Your context may vary, but these insights should hit home for most teams shipping software.

Why Should We Even Measure Quality?

I believe there are three reasons to measure software quality — and they all come back to one thing: customers.

Before release: Will what we’ve built actually make customers happy?
After release: Are customers happy with what we shipped?
Long-term: Will they stay happy as they keep using it?

If you can’t answer all three, you can’t truly claim your product is high quality. And if you care about building something customers love — you should.

In this post, I’m focusing on the first: how to measure quality before release.

Quality Metrics for Product Under Development

Let’s go over various metrics and discuss what they’re good for — and where they fall short.

📏 Code Coverage

If you were to ask someone about quality metric, code coverage would probably be the most common answer. This metric measures the effectiveness of your test suite i.e. what percentage of your code gets exercised by your automated tests. There are various flavors of coverage such as coverage by line, statement, branch, condition & function. Code coverage can be a good metric for evaluating code quality but I don’t think it is a powerful metric for measuring product quality. The reason I say this is because the code can be very well written and efficient but it may be implementing the business logic incorrectly or missing a business logic. In this case your code coverage may be 100% but it will cause an incident in production.

✅ Good for: Evaluating code quality and test suite effectiveness. Also helps in enforcing engineering best practices.
❌ Not enough for: Assessing product quality. A system can have 100% coverage but still implement the wrong logic or miss entire use cases.

🧮 Cyclomatic Complexity

In a similar vein to code coverage we also have Cyclomatic Complexity .Measures the number of decision paths in code. The idea: simpler code is easier to maintain and less error-prone.

✅ Encourages clean, maintainable design.
❌ Doesn’t guarantee correctness. You can have elegant code that completely misses key user scenarios. Also, keeping complexity low can be impractical at scale.

While I believe that metrics like code coverage and cyclomatic complexity are good metrics to assess code quality and establish engineering best practices in teams, I wouldn’t use these metrics to decide if we should launch a feature in production. Because code quality and product quality are very different. Your code quality can be high but product quality can fall short.

Now, let’s talk about some other metrics. What about measuring some metrics around bugs?

🐞 Bug Count

Simple metric — number of bugs logged for a feature.

✅ Might indicate quality in some cases.
❌ Can be misleading. Teams that don’t log bugs diligently will show a low count. Some bugs may be minor, while others critical ones go undiscovered. The raw count doesn’t tell the full story.

How about tweak the bug count metric and measure it against the amount of code written? Will that be an useful metric? That would be bug density.

🐛 Bug Density

Number of bugs per 1,000 lines of code.

❌ Same limitations as above. Depends heavily on logging practices and doesn't reflect severity or user impact.

Okay so we haven’t yet arrived at the metrics that will be give us a good indication of product quality before release but I feel like we have eliminated some that seem valuable to track but wouldn’t in fact indicate product quality. We are making progress. What if we were to measure coverage of product use cases instead of the code coverage. What would that look like?

✅ Use Case Test Coverage

This is one of my favorite metrics. It answers: “What percentage of customer-facing use cases have been tested?”

✅ Focuses on product behavior, not just code.
✅ Easy to communicate to both technical and business teams.
✅ Helps prioritize issues based on user impact.
⚠️ Depends on having well-written product requirements (but if that’s missing, quality isn’t your biggest problem).

⏱ Defect-Free Duration

Ever finished testing, only to find a launch-blocking bug the next day? In complex systems, some issues only surface after continuous, real-world-style testing. That’s where Defect-Free Duration comes in — it tracks how long you’ve been actively testing without finding new bugs.

It's a signal that your product has settled. But how long is “long enough”? It depends. A few days might cut it for low-impact changes. For a big, high-stakes launch? You’ll want weeks. Tailor the duration to the blast radius.

✅ Gives a pulse on product stability.
✅ Builds confidence, especially if testing volume is high (consider adding beta users, friendlies, etc.).
❌ Not meaningful if testing isn’t happening regularly.

No metric is perfect — and none should be treated as a silver bullet. But in my experience, Use Case Test Coverage and Defect-Free Duration are two solid proxies for gauging product quality and deciding if you’re truly launch-ready.

So, if these metrics are green, are we good to go live? Not so fast!

While the two metric above help us establish the product readiness from a functional point of view in a controlled user group, we also need to know that the product is going to work in wild for everyone. This is where the non-functional quality metrics come in and depending on factors such as traffic expected and sensitivity of the use cases, most of these metrics are non negotiable for a launch.

⚙️ Non-Functional Metrics (Performance, Stability, Security)

When it comes to non-functional metrics, the key is defining what "good" really looks like. These metrics aren’t one-size-fits-all; they depend on your product’s maturity, user base, and competition. Setting the right thresholds will require some in-depth conversations with stakeholders.

Also, remember to measure across a range of user conditions and bucket the results by the 50th, 95th, and 99th percentiles to truly understand the severity of any issues.

Now, let’s dive into the critical non-functional metrics.

🔁 Latency / Response Times

This is the measure of how fast your application responds. This can be measured for the frontend and backend. In addition we can also measure the end user experience with respect to latency. For example, how fast are page load times for different use cases.

✅ Essential for user experience.
✅ Should be measured at frontend, backend, and end-user levels.
✅ Use industry guidelines (e.g., Google Core Web Vitals: LCP, INP, CLS).
⚠️ Test under varied conditions and evaluate by P50/P95/P99 to catch edge cases.

📈 Throughput

Measures how many requests the system can handle per minute. It can help identify bottlenecks, limitations, cost inefficiencies and poor design choices in your system. If serving the same number of traffic requires you to double the amount of servers because of low throughput, we have suddenly doubled our infrastructure cost. Throughput does not scale linearly especially when your app has a database and there are third party API connections. This is a valuable non functional metric to track for new products and feature releases.

✅ Helps identify bottlenecks and cost inefficiencies.
✅ Useful when evaluating scalability of new features.
⚠️ Low throughput = higher infra cost. Doesn’t always scale linearly due to databases or 3rd-party APIs.

💥 Crash Rate

This measures the percentage of session that resulted in the application not responding or closing unexpectedly. This metric is especially useful for mobile, desktop or device based applications. The best applications aim for 99.9% crash free sessions so assessing what should be the target would need some analysis.

Especially important for mobile, desktop, or device-based apps.
✅ Measured as % of sessions that crash.
✅ Top-tier apps aim for 99.9%+ crash-free sessions.

🔒 Security Metrics

🚨 Open Critical Vulnerabilities

✅ All critical issues should be addressed before release.

🛡 Penetration Test Findings

✅ External vendors bring objectivity.
✅ High-impact way to catch overlooked security risks.

🔐 AuthN/AuthZ Tests

✅ Validate secure access controls.

🧱 Rate Limiting & Abuse Protection

✅ Must-have in today’s world of bots and AI-powered attacks.
✅ Implement protections against denial-of-service or scraping attempts.

Wrapping up

We discussed a wide variety of quality metrics that can be used to decide whether a feature is ready for production release. There are no perfect answers here! What works for one product and team may not work for another. The key is to choose the right set of metrics that give a strong quality signal for your specific product — and to keep revisiting them as your product evolves.

There’s no single metric that can tell you, “Yes, this is ready.” But taken together, the right signals — from use case coverage to crash rates, latency to open vulnerabilities — can give you a clear and confident picture.

Quality is a mindset, not a milestone. And like all good software, our approach to measuring it should be just as thoughtful, iterative, and user-focused.

So what are you favorite quality metrics to track before releasing software to production?

The $460 Million Mistake That Crashed a Wall Street Giant—and What we can learn from it

On the morning of August 1, 2012, Knight Capital - then one of the biggest market maker on Wall Street - deployed new code to its high-speed trading system—but one of eight production servers never got the update. That lone machine started running an old, dormant module called “Power Peg,” flooding the market with errant trades. In just 45 minutes, Knight amassed nearly $7 billion in accidental positions and lost $460 million. It was one of the most expensive software failures in Wall Street history—driven by a rushed deployment, missing checks, legacy code left behind, and no clear plan to roll back. This is the story of how a routine release turned into a company-ending event—and what leaders today can learn from it. Founded in 1995, the Knight Capital Group was the largest market maker in US equities. Knight’s electronic trading group covered more than 19000 securities and it’s high frequency trading algorithms processed a daily trading volume of $20 billions which was 15% of ...

ankur.blog

Search This Blog