The Evidentiary Backbone of Science of Reading 2.0: Building Efficacious Assessment Systems on the Science Learning & Improvement
The Evidentiary Backbone of Science of Reading 2.0 in the AI Era: Building Efficacious Balanced Assessment Systems on the Foundations of Learning, Measurement, and Improvement Sciences
Moving the needle on literacy requires more than mandates; it demands an evidence-based approach to implementation, which is the focus of our Wonkathon piece, "Science of Reading 2.0: Assessment in the Service of Learning as the Backbone of Science-Powered Reading Improvement." In this article, Edmund W. Gordon and Eric Tucker detail how an assessment system in the service of learning can drive SOR success. Crucially, in today’s post, we profile the scientific publications and esteemed researchers whose work—drawn from the Handbook for Assessment in the Service of Learning series—provides the high-quality research and rigorous framework that undergirds every recommendation in our submission.
Science of Reading 2.0: Assessment in the Service of Learning as the Backbone of Science-Powered Reading Improvement
By Edmund W. Gordon and Eric Tucker
Mississippi’s reading revolution offers powerful lessons: the state paired phonics mandates with on-the-ground capacity-building—training, coaching, and assessments—to catch struggling readers early. The results demonstrate that all children can succeed when given appropriate and sufficient opportunities. Broad adoption of Science of Reading (SOR) laws is a fortunate arrival, as we face a literacy challenge with barely a third of our fourth graders proficient on the NAEP. Mississippi's success proves that passing a law is just the beginning.
While many wonks highlight essential work in teacher training and curriculum, these advances require an often-overlooked backbone: an assessment system that ensures every child's learning is visible. Optimizing SOR 1.0 approaches is necessary but insufficient; teachers won’t get far flying blind without timely insights. SOR laws will succeed best if states, districts, and schools prioritize assessment in the service of learning.
To truly fulfill its promise, the focus must move beyond K-3 to include both pre-kindergarten and older readers who were omitted by earlier efforts. Implementing this requires alignment across educators, training programs, and parents.
A balanced assessment system—where each quiz, screening, and exam is used to improve instruction, not just audit status—will help ensure SOR reforms translate into real reading gains. This approach builds upon existing commitments to instructional excellence and high expectations for every learner, proposing we move towards what we characterize as SOR 2.0. As members of the editorial team of the recently published Handbook for Assessment in the Service of Learning series, we base the arguments herein on rigorous, hyperlinked research.
Building Reader Positioning Systems: From Single-Score Verdicts to Balanced Assessment
Build Upon:
Ambitious Grade-Level Reading Expectations and Approaches Grounded in the Science of Learning
Move Towards:
Balanced Assessment Systems With Capacity to Inform and Enhance Teaching and Learning
While current SOR approaches have promoted high standards, transparency for families, and data disaggregation, their assessment systems have often fallen short of their potential to support teaching and drive better decision-making. Technological breakthroughs, analytic methods, and artificial intelligence (AI) are rapidly transforming how we can measure and enhance reading.
To best support educators, states and school systems must establish balanced assessment systems designed to maximize learning for every child. The redesign must center ambitious instruction, using data to inform schools and teachers. Assessment in the service of learning holds educators and communities accountable to cultivate ability. With colleagues, we have proposed research-backed principles to guide the design and use of learning-focused assessments. For these systems to advance SOR-grounded approaches, they must:
- Measure what matters most, not just what's easy to measure.
- Leverage emerging technologies to provide rich, useful insights.
- Prioritize solutions optimized for scientific soundness, SOR fidelity, scalability, usefulness and usability.
The sciences of learning and development can and should guide this innovation.
Learning to Improve: Practical Measures to Inform Ambitious Teaching of Reading
Build Upon:
Quality Screening for Grades K-3 and Educator Training
Move Towards:
Instructionally Useful Assessment and Practical Measurement for Improvement to Enhance Reading Across PreK-8
The Mississippi lesson is clear: lasting gains require ongoing monitoring and support.
Traditional assessment systems administer infrequently, focus on distal outcomes, and return results too late, remaining distant from the specific needs of practitioners. In contrast, practical measurement for improvement uses evidence to inform judgment about the "why," "how," and "for whom" of daily improvement.
How might educators translate the snowstorm of data into actionable insights? Through instructionally useful assessment, which facilitates educator engagement in high-leverage practices such as quality formative assessment. This includes dynamically integrating SOR curriculum, instruction, and assessment. For example, a quick phonemic awareness quiz for a small group on a Friday might help a 1st-grade teacher immediately reteach a blending skill on Monday, long before it becomes a months-long lag.
We recommend Hattie’s evidence-backed mindframes that emphasize insights into reader progress, strategies, emotions, and self-regulation rather than just status. This means prioritizing the development of 'assessment-capable learners'—students who can interpret and act on feedback to take charge of their own improvement. Odemwingie shares a practical example where reading assessments foster learner agency and self-efficacy while supporting classroom learning culture. A culture of formative assessment makes it impossible to ignore a child's needs until it's too late, serving as a continuous GPS, recalculating the route to reading success for each child.
Embracing Reader Variation: Customized Learning Powered by SAFE AI
Build Upon:
Personalization Leveraging Universal Design for Learning (UDL) and Multi-Tiered Systems of Support (MTSS)
Move Towards:
Embracing the Safe, Accountable, Fair, and Effective (SAFE) Use of AI to Achieve Person-Specific Insights and Customization
AI and person-specific measurement can reshape how we understand and nurture reading processes. AI has the potential to generate insights that:
- Customize learning, adapting to reader strengths and weaknesses, optimizing pace, accessibility, content, and scaffolding.
- Provide real-time feedback, supporting readers’ processes like motivation, attention, engagement, and metacognition.
- Leverage predictive analytics to match interventions to learner needs.
The integration of AI into assessment, while beneficial, demands a rigorous evaluation of its safety and efficacy, necessitating clear guidelines and guardrails. Duolingo’s responsible AI standards are instructive.
The Personalized Mastery Learning Ecosystem (including My Reading Academy®) has showcased significant learning gains across multiple ESSA-aligned studies. PBS KIDS uses dynamic leveling systems to adapt to individual reader motivations and needs. Khan Academy analyzes performance data—like accuracy and scaffold usage—to provide continuous, skill-level insights. Its AI-powered assistant, Khanmigo, is transforming the modalities through which insights are generated and understood. Huff, of Curriculum Associates, proposes design approaches that foster a deeper shared understanding of the reader’s cognitive processes and maintain the integrity of the reading constructs being measured.
Together, these examples reflect a rising generation of assessment: highly personalized, multi-modal, adaptive, and focused on continuously optimizing the learning journey for every student.
SOR 2.0: A Call for Assessment Systems that Cultivate Reading Ability
States and districts have a window to marry proven reading science-driven standards with efficacious implementation. The SOR wave set strong expectations; now, we must actively deploy modern, balanced assessment systems that personalize and inform reading instruction at scale. This can transform assessment from a bureaucratic speed bump into a building block for learning.
We urge three critical steps, which compliment the recommendations outlined above:
- Tie balanced assessment into SOR implementation: Ensure that SOR rollout plans include robust universal PreK–3 screeners, appropriate middle-years screeners, and revamped summative exams that integrate SOR domains.
- Invest in innovation pilots: Fund pilot programs for AI-powered literacy interventions and tutors guided by embedded assessments, and evaluate which tools best boost reading gains.
- Incentivize innovation with accountability: Create policies that encourage school systems to try new assessment models while holding them accountable for transparent, real-time learning.
The payoff is clear: confident third-grade readers, engaged middle-schoolers, and graduates prepared for the future of work. As our first author has long championed, assessment must be re-purposed to facilitate the cultivation of ability and actively support the learning process, not just audit outcomes after the fact. By embracing this vision, honoring the strength of the learning sciences and potential AI, together we can help to secure literacy and flourishing for more children.
Unpacking References and Excerpts
To best support educators, states and school systems must establish balanced assessment systems designed to maximize learning for every child.
Marion, S. F., & Evans, C. M. (2025). Conceptualizing and evaluating instructionally useful assessments. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
Scott Marion and Carla Evans write:
“In the recent National Academy of Education publication, Reimagining Balanced Assessment Systems (Marion et al., 2024), the authors focus on rebalancing assessment systems to privilege rich classroom learning environments. The first part of the updated definition makes this point: Balanced assessment systems and practices, as conceived by this volume’s authors, are intentionally designed to provide feedback to students and information for teachers to support ambitious and equitable instructional and learning opportunities. This type of assessment system facilitates educator engagement in high-leverage professional practices such as quality formative assessment to support ambitious and equitable teaching (Marion et al., 2024, p. 2).”
Assessment in the service of learning holds educators and communities accountable to cultivate ability.
Gordon, E. W. (2025). Toward assessment in the service of learning. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
Edmund Gordon writes: “The lesson I took from Haeussermann was simple yet profound: that assessment should be used not only to identify what is, but to imagine and cultivate what might become. In every learner’s struggle, there is the seed of possibility, and our charge as educators is to create the conditions under which that possibility can take root and flourish.”
“With colleagues, we have proposed research-backed principles to guide the design and use of learning-focused assessments.”
Baker, E. L., Everson, H. T., Tucker, E. M., & Gordon, E. W. (2025). Principles for assessment in the service of learning. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
Baker et al write, “This chapter offers a set of seven principles to guide the design and use of learning-focused assessments, that is, educational tests and assessments intended to support student learning.”
Measure what matters most, not just what's easy to measure.
Pellegrino, J. W. (2025). Arguments in support of innovating assessments. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
“The first argument is that assessment should measure what matters, not just what is easy to measure. This means expanding the range of educational outcomes we assess to include the complex cognitive, socio-cognitive, and socio-emotional constructs that are essential for success in the worlds of today and tomorrow. “
Prioritize solutions optimized for scientific soundness, SOR fidelity, scalability, usefulness and usability.
Hanno, E. C., Horner, E. M., Portilla, X. A., & Hsueh, J. (2025). Centering the voices of assessment users in the advancement of early learning measures. In E. M. Tucker, E. L. Baker, H. T. Everson, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume
“The fourth goal documents properties of the data outputs from early learning assessment tools reflecting their usefulness. Collecting assessment data is only as meaningful as the actions the data can inform. For educators and families, assessment data can inform decisions about how to best support individual children to be successful. For early education systems, these data can inform decisions about the most effective approaches to improve early learning experiences in ways that ultimately foster positive child outcomes. Parameters in this goal underscore the importance of making data outputs timely, understandable, and actionable for a variety of purposes. It particularly emphasizes the need to make data accessible for families who typically receive limited information on their children’s skills. It also highlights the potential for child assessments to serve as a conduit for collaborative communication between educators and families about children’s development.”
“User-testing has also provided insights into how to make assessments more usable for early educators. Given that pre-K classrooms are dynamic environments with lots going on, educators have requested the option to pause assessments midway to allow children the opportunity to start back where they left off rather than having to repeat completed items. This could accommodate common short interruptions like bathroom breaks. Future testing through the Measures Initiative will explore whether these types of short pauses affect student performance on assessments. Teachers also requested that initial training materials include more concrete guidance on how to set up and use the assessments in their classrooms. This includes how to store, charge, and turn on technology; how to connect devices to the internet; and how to use the tool during different instructional formats (e.g., small groups, centers). These early insights illuminate the importance of considering user perspectives in assessment design to align features and supports with what will work in real world settings and give children the best opportunity to demonstrate what they know and can do.”
The sciences of learning and development can and should guide this innovation.
Pea, R., Lee, C. D., Nasir, N. S., & McKinney de Royston, M. (2025). The cultural foundations of learning: Design considerations for measurement and assessment. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
“We begin the chapter with an exploration of insights from the most recent science on what learning is and how it happens. We draw from the Science of Learning and Development (SOLD), to explicate key ideas about the nature of learning and the kinds of learning that assessment should serve (Darling-Hammond, Flook, Cook-Harvey, Barron, & Osher, 2020). Then we turn to a discussion of assessment, underscoring that our current system of assessment in the U.S. primarily focuses on sorting, rather than learning (Goldman & Lee, 2024), and in doing so, such assessments too often reify racial and class-based disparities. We then examine how we should be thinking about assessment practices, exploring what might be optimal if we were seeking to assess deep learning.”
In contrast, practical measurement for improvement uses evidence to inform judgment about the "why," "how," and "for whom" of daily improvement.
LeMahieu, P., & Cobb, P. (2025). Practical measurement for improvement: Foundations, design, rigor. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
“Educational systems today face persistent challenges that demand not only innovation but disciplined learning about what works, for whom, and under what conditions. Improvement science has emerged in response to this challenge, offering a structured, iterative, and evidence-based approach to addressing complex problems of practice. At the heart of this approach lies “practical measurement,” a form of assessment that is embedded within the flow of professional practice and is designed to support real-time learning and continuous improvement. This essay foregrounds three critical aspects of practical measurement in education: (1) the theoretical foundations of improvement science and its implications for measurement; (2) the design and implementation of practical measures; and (3) the technical quality and validity concerns that must be addressed to ensure responsible and equitable use.”
Through instructionally useful assessment, which facilitates educator engagement in high-leverage practices such as quality formative assessment.
Marion, S. F., & Evans, C. M. (2025). Conceptualizing and evaluating instructionally useful assessments. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
“We consolidated the ideas of many who came before us and defined an instructionally useful assessment as one that “…provides substantive insights about student learning strengths and needs relative to specific learning targets that can positively influence the interactions among the teacher, student, and the content” (Evans & Marion, 2024, p. 19). We further explored how instructionally useful assessments can support teachers by revealing insights through the assessment processes themselves, reporting results that shed light on student learning, or simply as a function of participating in the assessment (e.g., Agarwal et al., 2008).”
This includes dynamically integrating SOR curriculum, instruction, and assessment.
Armour-Thomas, E. (2025). Dynamic pedagogy: A perspective for integrating curriculum, instruction, and assessment in the service of learning at the classroom level. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
“The chapter contends that realizing this vision requires embedding assessment within a framework of Dynamic Pedagogy—an integrated model in which assessment, curriculum, and instruction are inseparable and mutually reinforcing, all working together to promote learning with understanding. The chapter begins by defining and conceptualizing Dynamic Pedagogy, positioning learning with understanding as the central focus of the interdependent relationships among assessment, curriculum, and instruction.”
We recommend Hattie’s evidence-backed mindframes that emphasize insights into reader progress, strategies, emotions, and self-regulation rather than just status.
Hattie, J., Sireci, S. G., & Baker, E. L. (2025). Mind frames for improving educational assessment. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
“This chapter calls for a shift toward assessment in the service of learning, emphasizing insights into student progress, learning strategies, emotions, engagement, and self-regulation rather than just achievement. To support this, educators must develop assessment-capable learners who can interpret and act on assessment results. The authors introduce 10 mind frames to enhance assessment, promoting diagnostic and predictive uses, clear success criteria, instructional alignment, and a classroom culture that embraces errors as learning opportunities. They also explore how technology and AI can make assessments more adaptive and personalized. By embedding assessment within teaching and learning, these mind frames transform it from a compliance tool into a driver of student growth and educational improvement.”
Odemwingie shares a practical example where reading assessments foster learner agency and self-efficacy while supporting classroom learning culture.
Odemwingie, M., & Cockrell, K. (2025). From evaluation to impact: Transforming assessment into a tool for learning. In E. M. Tucker, E. L. Baker, H. T. Everson, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume III: Examples of assessment in the service of learning. University of Massachusetts Amherst Libraries.
Brookhart, S. M. (2025). Developing educational assessments to serve learners. In S. G. Sireci, E. M. Tucker, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume II: Reconceptualizing assessment to improve learning. University of Massachusetts Amherst Libraries.
AI and person-specific measurement can reshape how we understand and nurture reading processes.
Cantor, P., & Felsen, K. (2025). It’s time for a paradigm shift in educational measurement. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries. Pamela Cantor MD Kate Felsen
“Educational measurement should be done in such a way that we learn more about the fit between an individual student and the learning context. This is doable today. It is fit that amplifies purpose and confidence. It is fit that primes performance. It is fit that produces cures. It is fit that unlocks human potential. The Rosetta Stone for measurement is about how close we can get to measuring the individual, understanding and measuring the context, and measuring the fit between the two (Cantor et al., 2021). If this existed today, all learners, and all of us, would understand what we need to reach the top of our developmental range, and what, in our community, school, work, team, troop, would enable us to perform at our best. “
Provide real-time feedback, supporting readers’ processes like motivation, attention, engagement, and metacognition.
Bennett, R. E., Baker, E. L., & Gordon, E. W. (2025). Personalizing assessment for the advancement of equity and learning. In S. G. Sireci, E. M. Tucker, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume II: Reconceptualizing assessment to improve learning. University of Massachusetts Amherst Libraries.
Duolingo’s responsible AI standards are instructive.
Burstein, J., LaFlair, G. T., Yancey, K., von Davier, A. A., & Dotan, R. (2025). Responsible artificial intelligence for test equity and quality: The Duolingo English Test as a case study. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries. Jill Burstein , Geoff LaFlair Kevin Yancey Ravit Dotan, PhD
The Personalized Mastery Learning Ecosystem (including My Reading Academy®) has showcased significant learning gains across multiple ESSA-aligned studies.
Betts, A., Gunderia, S., Hughes, D., Owen, V. E., & Bang, H. J. (2025). Beyond measurement: Assessment as a catalyst for personalizing learning and improving outcomes. In E. M. Tucker, E. L. Baker, H. T. Everson, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume III: Examples of assessment in the service of learning. University of Massachusetts Amherst Libraries. Anastasia Betts, Ph.D. , Sunil Gunderia , Elizabeth Owen, PhD ,
PBS KIDS uses dynamic leveling systems to adapt to individual reader motivations and needs.
Roberts, J. D., Younger, J. W., Corrado, K., Felline, C., & Lovato, S. (2025). Practical examples of assessment in the service of learning at PBS KIDS. In E. M. Tucker, E. L. Baker, H. T. Everson, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume III: Examples of assessment in the service of learning. University of Massachusetts Amherst Libraries.
Jeremy Roberts , Jessica Wise Younger, PhD Kelly Corrado , Cosimo Felline , Silvia Lovato
Khan Academy analyzes performance data—like accuracy and scaffold usage—to provide continuous, skill-level insights.
DiCerbo, K. (2025). Formative assessment in a digital learning platform. In E. M. Tucker, E. L. Baker, H. T. Everson, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume III: Examples of assessment in the service of learning. University of Massachusetts Amherst Libraries. Kristen Eignor DiCerbo
Its AI-powered assistant, Khanmigo, is transforming the modalities through which insights are generated and understood. Huff, of Curriculum Associates, proposes design approaches that foster a deeper shared understanding of the reader’s cognitive processes and maintain the integrity of the reading constructs being measured. Kristen Huff
Huff, K. (2025). Designing and developing educational assessments for contemporary needs. In E. M. Tucker, E. Armour-Thomas, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume I: Foundations for assessment in the service of learning. University of Massachusetts Amherst Libraries.
Ensure that SOR rollout plans include robust universal PreK–3 screeners, appropriate middle-years screeners, and revamped summative exams that integrate SOR domains.
Sutherland, R., Schreuder, M., & Townley-Flores, C. (2025). Learning to read doesn’t end in third grade: Supporting older readers’ literacy development with a validated foundational skills assessment. In E. M. Tucker, E. L. Baker, H. T. Everson, & E. W. Gordon (Eds.), Handbook for assessment in the service of learning, Volume III: Examples of assessment in the service of learning. University of Massachusetts Amherst Libraries. Rebecca Sutherland , Mary-Celeste Schreuder, PhD , Carrie Townley-Flores
Tagging colleagues who appreciate this launch:
Sheila Parveen Lallmamode, Ph.D. , Steven Hodas ; Anne Mackinnon ; Andrew Duval ; Sara Finney ; Ryan Baker ; Michael H. Levine ; Catherine Kramarczuk Voulgarides ; Perry Green, III Alexis Cobo, Ed.D. ; Natasha Tetruyeva 🇺🇦 ; Edith Aurora Graf ; Josh Chamot ; James L. Moore III ; Carol Ezzelle, PhD ; April Zenisky ; Dena Simmons ; Tianying Feng ; Mikyung Kim Wolf ; Kennedy Kamau ; Barbara Pape ; Maria Elena Oliveri ; Ed Dieterle ; Ketan (केतन) ; Erik M. Hines ; Christian Houghton ; Jiangang Hao ; Ramona Boodoo ; Alan Coverstone ; Greg Sommers ; Heather Klesch ; Christina Cipriano ; Keeanna Warren, Ph.D. .; Narges Z.
Research Associate Professor, College of Engineering, Purdue University
2dI voted : )
Senior Director, Learner Variability Project at Digital Promise and Co-founder of the IEP Project
2dGreat piece. Voting now!!
Leading a team of designers, applied researchers and educators to advance the future of learning and assessment.
2dChristian Houghton, thank you for sharing this with your network.
Leading a team of designers, applied researchers and educators to advance the future of learning and assessment.
2dEunice Eunhee Jang and Erik M. Hines -- Thank you for sharing this with your networks!
Co-Founder & CEO | Unlocking Human Potential through Science, Technology, and Learning Innovation | Board Member at InnovateEDU & Children’s Institute
3dFantastic article and just the right amount of wonky! Vote cast!