Understanding Statistical Significance Beyond P-Values

Explore top LinkedIn content from expert professionals.

Summary

Understanding statistical significance goes beyond focusing solely on p-values. While p-values indicate the likelihood that a result is due to chance, they don’t reveal whether the effect is meaningful in real-world contexts, highlighting the importance of considering practical and substantive significance alongside statistical measures.

  • Examine effect size: Go beyond p-values by calculating the magnitude of an effect, such as with effect size, to assess its practical relevance in real-world scenarios.
  • Focus on meaningful outcomes: Evaluate whether the observed results lead to impactful changes or align with your objectives, rather than relying solely on statistical thresholds like p < 0.05.
  • Ensure transparent reporting: Include study design details, effect sizes, confidence intervals, and data quality to enable a comprehensive evaluation of research findings.
Summarized by AI based on LinkedIn member posts
  • View profile for Jan-Benedict Steenkamp
    Jan-Benedict Steenkamp Jan-Benedict Steenkamp is an Influencer

    Massey Distinguished Professor | Editor in Chief Journal of Marketing | Award-winning author | Top 0.02% scientist worldwide | Creator of the 4-factor Grit Scale

    26,705 followers

    STATISTICAL V. SUBSTANTIVE SIGNIFICANCE Take a moment to consider the following scenario. One study with n = 100 reports a focal effect with an associated p-value of 0.02. Another study with n = 1000 reports a focal effect with an associated p-value of 0.02. Which study presents the strongest evidence the effect is really there? This scenario is adapted from Bakan (Psych. Bull. 1966). Many scholars chose the second scenario. They are wrong. I quote Bakan (p. 429): “The rejection of the null hypothesis when the number of cases is small speaks for a more dramatic effect in the population [larger effect size]; and if the p-value is the same, the probability of committing a Type I error remains the same.” Many papers equate implicitly or explicitly statistical significance with substantive significance. Yet, a p-value does not inform you whether the effect has any real world meaning. Ralph Tyler (Educ. Res. Bulletin, 1931) already wrote that a statistically significant difference is not necessarily an important difference, and a difference that is not statistically significant may be an important difference. Unfortunately, we are still making the same mistake 90 years later. A statistically significant result may be substantively nonsignificant (trivial). But also, a statistically nonsignificant result may be substantively significant. I see so many studies in our field reporting regression coefficients with *** and I have no idea how large the effect is. This tendency to equate statistical with substantive significance persists to the extent that the prestigious American Statistical Association (not exactly an organization afraid of advanced statistics) came out with a formal statement on p-values—the “ASA Statement on Statistical Significance and P-Values" cautioning researchers: "Statistical significance is not equivalent to scientific, human, or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small p-value if the sample size or measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise." I am not arguing against statistical significance. Rather that articles should report statistical AND substantive significance. In my view, our primary task is uncovering factors that make a meaningful difference. That calls for effect sizes. As a nice “bonus,” substantive significance is less amenable to p-hacking than statistical significance. If you enjoyed this, share it with others and follow me, Jan-Benedict Steenkamp, for more writing. Journal of Marketing

  • View profile for Pritul Patel

    Analytics Manager

    6,388 followers

    🟠 "How do you explain p-value to a non-technical person?" This question should be eliminated from interview rounds. Here is why... Every time I hear this interview question, I feel a knot in my stomach. Not because I don't understand p-values but because I am not sure whether the interviewer truly understands them! Most people can recite the technical definition, "The probability of observing results at least as extreme as ours, assuming the null hypothesis is true." But explaining this concept to non-technical people reveals the heart of the problem. The internet is overflowing with misleading "layman" explanations. - "It is the probability the null hypothesis is true" - "A p-value below 0.05 means your treatment has a 95% chance of being effective" - "If p < 0.05, it means your findings are true 95% of the time" - "The p-value is the probability that your observed difference is real" All of these explanations are fundamentally incorrect. And there is no straight-forward way to explain p-value to a non-technical person unless you walk them through a specific scenario, which ends up becoming a very long answer. If you want someone to explain p-value in one-line then the only explanation is the technical definition. When I give the correct long explanation, I often get confused looks, even from technical interviewers. This reveals a deeper issue. If p-values are so widely misunderstood, should we be really using them: 1. In interviews assessing candidate's statistical understanding? 2. As the central measure in experimentation and decision-making? Instead of fixating on p-values, interviewers should ask candidates about. - Practical significance vs. statistical significance - Sample size determination and its relationship to experiment power - Minimum detectable effects (MDE) and how to interpret them - Confidence intervals and what they tell us about effect sizes - How to communicate experimental results to stakeholders A non-technical person should not be concerned about p-values in experiment analysis. A good experiment design DOES NOT even require p-values for decision making. Sample size, MDE, observed lift, and Confidence Intervals are enough to make a ship or no-ship decision. If you are an interviewer testing statistical literacy, the p-value question is a poor proxy. If you are designing high-powered experiments that lead to actionable business conclusions, p-values are just one small (and, honestly, optional) part of a more comprehensive analysis framework. Strong data scientists know when to use p-values and when not to rely on them exclusively.

  • View profile for Jason Thatcher

    Parent to a College Student | Tandean Rustandy Esteemed Endowed Chair, University of Colorado-Boulder | PhD Project PAC 15 Member | Professor, Alliance Manchester Business School | TUM Ambassador

    75,660 followers

    On statistical significance, null hypotheses, and evaluating research (or how we really should report results + access to a helpful article). How to evaluate & report findings has become a hot-button issue across disciplines. Many have posted & written about how estimates of statistical significance don't mean much. The challenge with the critique, for many, has been the absence of a pragmatic set of guidance for what to do next. Blakely B. McShane (Northwestern University), Eric Bradlow (University of Pennsylvania), John Lynch (University of Colorado Boulder - Leeds School of Business), & Robert J Meyer (University of Pennsylvania) recently published a paper that provides that guidance in the Journal of Marketing. They contend that "the aim of studies should be to report results in an unfiltered manner so that they can later be used to make more general conclusions based on the cumulative evidence from multiple studies." What is important is that they argue for two things: 1. all studies should be published in some form or another & reporting should focus on quantifying study results via point & interval estimates (e.g., look at more than "p"). 2. General conclusions should be made based on the cumulative evidence from multiple studies (e.g., replication & triangulation). So what to do, in addition to reporting "p", researchers need more transparently report "plausibility of mechanism, study design, data quality, & others that vary by research domain." Which, if researcher do so, makes it easier to evaluate the paper & the stream of papers that it contributes to. I found the paper clear, compelling, & worth reading. Give it a look! The Reference: McShane, B. B., Bradlow, E. T., Lynch, J. G., & Meyer, R. J. (2024). “Statistical Significance” and Statistical Reporting: Moving Beyond Binary. Journal of Marketing, 88(3), 1-19. https://lnkd.in/ecG6hRsc The Downloadable Citation: https://lnkd.in/eeF3WJPb The Abstract: Null hypothesis significance testing (NHST) is the default approach to statistical analysis & reporting in marketing & the biomedical & social sciences more broadly. Despite its default role, NHST has long been criticized by both statisticians & applied researchers, including those within marketing. Therefore, the authors propose a major transition in statistical analysis & reporting. Specifically, they propose moving beyond binary: abandoning NHST as the default approach to statistical analysis & reporting. To facilitate this, they briefly review some of the principal problems associated with NHST. They next discuss some principles that they believe should underlie statistical analysis & reporting. They then use these principles to motivate some guidelines for statistical analysis & reporting. They next provide some examples that illustrate statistical analysis & reporting that adheres to their principles & guidelines. They conclude with a brief discussion. #researchmethods

  • View profile for Susanne Mitschke

    CEO & Founder @ Citruslabs | Harvard MPH | Forbes 30 Under 30 | 40 Under 40 | INC F500 | UofG💘 | Techstars

    8,145 followers

    “Statistically Significant.” But Does It Matter? In a clinical trial with 10,000 participants, a weight loss of just 0.5 kg in the treatment group might show up as statistically significant. But, does half a kilo actually change anyone’s health trajectory? Not really. Here’s what statistical significance doesn’t tell us: A p-value under 0.05 simply means the result is unlikely due to chance. It doesn’t mean the outcome is meaningful, impactful, or even helpful in the real world. Clinical vs. Statistical Significance The most important question we should be asking isn’t “Is it significant?” It’s: Does it change how people feel or function? A result can be statistically significant and still clinically irrelevant. And the reverse is true, too, particularly in underserved populations or studies of rare conditions where large samples just aren’t feasible. Why Bigger Isn’t Always Better Large trials can be statistically overpowered. With enough participants, even the tiniest effect will hit that magical p-value threshold. But that doesn’t mean the effect actually matters to people’s health, or that it’s worth building a product claim around. It’s not just about the numbers. It’s about meaning. When Stats Get Sketchy Some studies run dozens of tests, hoping something lands under 0.05 (a practice called p-hacking). Others retroactively hunt for outcomes that show an effect, even if they weren’t part of the original study design. These approaches may generate headlines, but they erode trust. They create marketing fluff, not real-world impact. And consumers (and regulators) are paying attention. What Credible Research Looks Like If you’re building a brand with staying power, good research is a differentiator. Here’s what that looks like: Pre-register your outcomes and analysis plans. No cherry-picking. Lead with effect size and clinical relevance, not just p-values. Report everything, even the “non-significant” results. Honesty builds authority. How you design, analyze, and report your trial matters as much as the outcome. If you’re investing in clinical research, make sure your team or CRO partners can hold up to scrutiny. Because in a market full of noise, real impact wins. How does your team define “impact” in clinical research? I’d love to hear your perspective. Sources: Elasan et al. “The difference between clinical significance and statistical significance: an important distinction for clinical research”, Turk J Med Sci. 2024 Nov  https://lnkd.in/g3gSxssN

Explore categories