Most teams pick metrics that sound smart… But under the hood, they’re just noisy, slow, misleading, or biased. But today, I'm giving you a framework to avoid that trap. It’s called STEDII and it’s how to choose metrics you can actually trust: — ONE: S — Sensitivity Your metric should be able to detect small but meaningful changes Most good features don’t move numbers by 50%. They move them by 2–5%. If your metric can’t pick up those subtle shifts , you’ll miss real wins. Rule of thumb: - Basic metrics detect 10% changes - Good ones detect 5% - Great ones? 2% The better your metric, the smaller the lift it can detect. But that also means needing more users and better experimental design. — TWO: T — Trustworthiness Ever launch a clearly better feature… but the metric goes down? Happens all the time. Users find what they need faster → Time on site drops Checkout becomes smoother → Session length declines A good metric should reflect actual product value, not just surface-level activity. If metrics move in the opposite direction of user experience, they’re not trustworthy. — THREE: E — Efficiency In experimentation, speed of learning = speed of shipping. Some metrics take months to show signal (LTV, retention curves). Others like Day 2 retention or funnel completion give you insight within days. If your team is waiting weeks to know whether something worked, you're already behind. Use CUPED or proxy metrics to speed up testing windows without sacrificing signal. — FOUR: D — Debuggability A number that moves is nice. A number you can explain why something worked? That’s gold. Break down conversion into funnel steps. Segment by user type, device, geography. A 5% drop means nothing if you don’t know whether it’s: → A mobile bug → A pricing issue → Or just one country behaving differently Debuggability turns your metrics into actual insight. — FIVE: I — Interpretability Your whole team should know what your metric means... And what to do when it changes. If your metric looks like this: Engagement Score = (0.3×PageViews + 0.2×Clicks - 0.1×Bounces + 0.25×ReturnRate)^0.5 You’re not driving action. You’re driving confusion. Keep it simple: Conversion drops → Check checkout flow Bounce rate spikes → Review messaging or speed Retention dips → Fix the week-one experience — SIX: I — Inclusivity Averages lie. Segments tell the truth. A metric that’s “up 5%” could still be hiding this: → Power users: +30% → New users (60% of base): -5% → Mobile users: -10% Look for Simpson’s Paradox. Make sure your “win” isn’t actually a loss for the majority. — To learn all the details, check out my deep dive with Ronny Kohavi, the legend himself: https://lnkd.in/eDWT5bDN
Tracking Usability Metrics Throughout The Design Process
Explore top LinkedIn content from expert professionals.
Summary
Tracking usability metrics throughout the design process means monitoring and analyzing specific data points to evaluate and improve how users interact with a product. It ensures that design decisions are driven by insights into user behavior, enabling a better overall experience.
- Define meaningful metrics: Choose metrics that are sensitive, trustworthy, and aligned with your product goals to ensure they reflect real user experiences and outcomes.
- Prioritize user perspective: Focus on understanding why users behave in certain ways by blending behavioral, attitudinal, and performance metrics for a more complete picture.
- Plan before testing: Establish clear objectives, define the smallest meaningful effect size (MDE), and determine your sample size to ensure your testing uncovers actionable insights.
-
-
AI changes how we measure UX. We’ve been thinking and iterating on how we track user experiences with AI. In our open Glare framework, we use a mix of attitudinal, behavioral, and performance metrics. AI tools open the door to customizing metrics based on how people use each experience. I’d love to hear who else is exploring this. To measure UX in AI tools, it helps to follow the user journey and match the right metrics to each step. Here's a simple way to break it down: 1. Before using the tool Start by understanding what users expect and how confident they feel. This gives you a sense of their goals and trust levels. 2. While prompting Track how easily users explain what they want. Look at how much effort it takes and whether the first result is useful. 3. While refining the output Measure how smoothly users improve or adjust the results. Count retries, check how well they understand the output, and watch for moments when the tool really surprises or delights them. 4. After seeing the results Check if the result is actually helpful. Time-to-value and satisfaction ratings show whether the tool delivered on its promise. 5. After the session ends See what users do next. Do they leave, return, or keep using it? This helps you understand the lasting value of the experience. We need sharper ways to measure how people use AI. Clicks can’t tell the whole story. But getting this data is not easy. What matters is whether the experience builds trust, sparks creativity, and delivers something users feel good about. These are the signals that show us if the tool is working, not just technically, but emotionally and practically. How are you thinking about this? #productdesign #uxmetrics #productdiscovery #uxresearch
-
Traditional UX Analytics tell us what happened - users clicked here, spent X minutes, and fell somewhere on the way. But they do not tell us why. Why did a user leave a process? Why did he hesitate before completing the action? This is where the hidden Markov model (HMM) comes. Instead of tracking only surface-level metrics, HMMs expose hidden users, showing how people infection between engagement, hesitation and frustration. With this, we can predict the drop -off before it is - a game changer for UX optimization. Take a health-tracking app. Standard analytics may show: - Some users log smooth data. - Browse without completing other tasks. - Repeat the data again and again before leaving anything. Standard matrix cannot tell us what users are experiencing. HMMs fill the difference that shows how users infection between states over time. By monitoring sessions, clicks and drop-offs, classify HMM users: - Moving → Smarting through tasks. - Search → Click around but not to complete the actions. - Disappointed → hesitation, possibility of repeating steps, leaving. Instead of reacting to the drop-off, teams may see the initial signals of disappointment and intervention. HMMs predict behavior, making UX research active: - Personal onboarding → finds out that users require help. - Hoosier A/B test → explains why a design works better. - Preemptive UI fix → identifies friction before leaving users. Blending qualitative insights with HMM-driven modeling gives a fuller picture of user experience. Traditional UX reacts to problems after research problems. HMM estimates issues, helping teams to customize experiences before despair set. As UX becomes more complex, tracking click is not enough - we need to understand the behavior pattern
-
I’ve found it largely to be the case that people who work on products (PMs, Designers, Researchers, Devs, Content, etc.) want to have more clarity on metrics but struggle to do so. What's helped me over time is to think about metrics like individual pieces to a puzzle. My goal is to figure out how these pieces fit together as a puzzle, but frequently, no one really knows what the puzzle looks like—especially the “the business” or executives. What's also helped me is to work backwards. Business metrics are typically pieces that already exist, so part of the puzzle is there. I work with my cross-functional partners to 1. create metrics we can directly influence as the makers of the products, and 2. figure out if/how they connect to those business metrics. It’s finding the “fits” of the puzzle pieces. At the start, it often looks like a bunch of random puzzle pieces. Some of the pieces are metrics: CSAT, NPV, ARU, Conversion Rate, Time on Task, Error Rates, Alt Tag %, Avg Time Spent In App, Trial User Rate, etc. Some of the pieces are goals and actions: "A new Design System component," “Improve Product Accessibility,” “Fix Bugs,” “Raise Quality,” etc. Members of the product team have puzzle pieces, but struggle to understand how they fit as a puzzle. It's a classic chicken and egg problem. When we start to map the relationships between metrics (thinking about cause and effect), post-mapping looks like something like this: Improve Accessibility (goal) -> Create New Design System Component (action) -> Alt Tag % (metric) -> Error Rate (metric) -> Time on Task (metric) -> Trial User Rate (metric) -> Time Spent in App (metric) -> Conversion Rate (metric) -> NPV (metric) -> ARPU (metric) -> Revenue (metric). After seeing how the pieces might fit together, that's when basic statistical analyses like Correlation or Linear Regression help them calculate if there are, in fact, relationships between metrics. IMO, the hard part is explaining this in a way that 1. makes sense to a wide range of individuals, and 2. compels them to do this. What's helped me do this hard part is having a partially filled-out map that has the metrics people care about and completing more of the map with product partners, so we're all on the same page. Once we complete that map and run some basic statistical analyses, we have more credible arguments for if/how the work that goes into making products translates to business goals. Truthfully, not every exec is convinced, but at least we know we're making more credible decisions as a team. If an exec loves NPS, no matter what the data says, it's not on us to adult for them. We can hold our heads high and know we're doing a good job.
-
Recently, someone shared results from a UX test they were proud of. A new onboarding flow had reduced task time, based on a very small handful of users per variant. The result wasn’t statistically significant, but they were already drafting rollout plans and asked what I thought of their “victory.” I wasn’t sure whether to critique the method or send flowers for the funeral of statistical rigor. Here’s the issue. With such a small sample, the numbers are swimming in noise. A couple of fast users, one slow device, someone who clicked through by accident... any of these can distort the outcome. Sampling variability means each group tells a slightly different story. That’s normal. But basing decisions on a single, underpowered test skips an important step: asking whether the effect is strong enough to trust. This is where statistical significance comes in. It helps you judge whether a difference is likely to reflect something real or whether it could have happened by chance. But even before that, there’s a more basic question to ask: does the difference matter? This is the role of Minimum Detectable Effect, or MDE. MDE is the smallest change you would consider meaningful, something worth acting on. It draws the line between what is interesting and what is useful. If a design change reduces task time by half a second but has no impact on satisfaction or behavior, then it does not meet that bar. If it noticeably improves user experience or moves key metrics, it might. Defining your MDE before running the test ensures that your study is built to detect changes that actually matter. MDE also helps you plan your sample size. Small effects require more data. If you skip this step, you risk running a study that cannot answer the question you care about, no matter how clean the execution looks. If you are running UX tests, begin with clarity. Define what kind of difference would justify action. Set your MDE. Plan your sample size accordingly. When the test is done, report the effect size, the uncertainty, and whether the result is both statistically and practically meaningful. And if it is not, accept that. Call it a maybe, not a win. Then refine your approach and try again with sharper focus.