• Tools and Resources
  • Customer Services
  • Original Language Spotlight
  • Alternative and Non-formal Education 
  • Cognition, Emotion, and Learning
  • Curriculum and Pedagogy
  • Education and Society
  • Education, Change, and Development
  • Education, Cultures, and Ethnicities
  • Education, Gender, and Sexualities
  • Education, Health, and Social Services
  • Educational Administration and Leadership
  • Educational History
  • Educational Politics and Policy
  • Educational Purposes and Ideals
  • Educational Systems
  • Educational Theories and Philosophies
  • Globalization, Economics, and Education
  • Languages and Literacies
  • Professional Learning and Development
  • Research and Assessment Methods
  • Technology and Education
  • Share This Facebook LinkedIn Twitter

Article contents

Authentic assessment.

  • Kim H. Koh Kim H. Koh Werklund School of Education, University of Calgary
  • https://doi.org/10.1093/acrefore/9780190264093.013.22
  • Published online: 27 February 2017

Authentic tasks replicate real-world challenges and standards of performance that experts or professionals typically face in the field. The term “authentic assessment” was first coined by Grant Wiggins in K‒12 educational contexts. Authentic assessment is an effective measure of intellectual achievement or ability because it requires students to demonstrate their deep understanding, higher-order thinking, and complex problem solving through the performance of exemplary tasks. Hence authentic assessment can serve as a powerful tool for assessing students’ 21st-century competencies in the context of global educational reforms. The review begins with a detailed explanation of the concept of authentic assessment. There is a substantial body of literature focusing on the definitions of authentic assessment. However, only those that are original and relevant to educational contexts are included.. Some of the criteria for authentic assessment defined by the authors overlap with each other, but their definitions are consistent. A comparison of authentic assessment and conventional assessment reveals that different purposes are served, as evidenced by the nature of the assessment and item response format. Examples of both types of assessments are included. Three major themes are examined within authentic assessment research in educational contexts: authentic assessment in educational or school reforms, teacher professional learning and development in authentic assessment, and authentic assessment as tools or methods used in a variety of subjects or disciplines in K‒12 schooling and in higher education institutions. Among these three themes, most studies were focused on the role of authentic assessment in educational or school reforms. Future research should focus on building teachers’ capacity in authentic assessment and assessment for learning through a critical inquiry approach in school-based professional learning communities or in teacher education programs. To enable the power of authentic assessment to unfold in the classrooms of the 21st century, it is essential that teachers are not only assessment literate but also competent in designing and using authentic assessments to support student learning and mastery of the 21st-century competencies.

  • authentic assessment
  • authentic tasks
  • criteria for authenticity
  • 21st-century competencies

Introduction

The term “authentic assessment” was first coined in 1989 by Grant Wiggins in K‒12 educational contexts. According to Wiggins ( 1989 , p. 703), authentic assessment is “a true test” of intellectual achievement or ability because it requires students to demonstrate their deep understanding, higher-order thinking, and complex problem solving through the performance of exemplary tasks. Authentic tasks replicate real-world challenges and “standards of performance” that experts or professionals (e.g., mathematicians, scientists, writers, doctors, teachers, or designers) typically face in the field (Wiggins, 1989 , p. 703). For instance, authentic tasks in mathematics need to elicit the kind of thinking and reasoning used by mathematicians when they solve problems.

In the assessment literature, some authors have argued that the term “authentic” was first introduced by Archbald and Newmann ( 1988 ) in the context of learning and assessment (Cumming & Maxwell, 1999 ; Palm, 2008 ). However, the term “authentic” in Archbald and Newmann ( 1988 ) was associated with achievement rather than assessment. A few years later, Newmann and Archbald ( 1992 ) provided a detailed explanation of authentic achievement. Cumming and Maxwell ( 1999 ) have aptly pointed out that authentic assessment and authentic achievement are interrelated, as it is important to identify the desired student learning outcomes and realign the methods of assessment to them. Authentic assessment should be rooted in authentic achievement to ensure a close alignment between assessment tasks and desired learning outcomes. This alignment is of paramount importance in the worldwide climate of curriculum and assessment reform, which places greater emphasis on the development of students’ 21st-century competencies—including critical and creative thinking, complex problem solving, effective communication, collaboration, self-directed and lifelong learning, responsible citizenship, and information technological literacy, just to name a few.

In addition to K‒12 education, “authentic assessment” was further defined by Gulikers, Bastiaens, and Kirschner ( 2004 ) in the context of professional and vocational training that incorporates competence-based curricula and assessments. To better prepare students for their future workplace, there is a need for assessment tasks used in professional and vocational education to resemble the tasks students will encounter in their future professional practice. Authentic assessments in competence-based education should create opportunities for students to integrate learning and working in practice, which results in students’ mastery of professional skills needed in their future workplace.

Authentic assessment has played a pivotal role in driving curricular and instructional changes in the context of global educational reforms. Since the 1990s, teacher education and professional development programs in many education systems around the globe have focused on the development of assessment literacy for teachers and teacher candidates which encompasses teacher competence in the design, adaptation, and use of authentic assessment tasks or performance assessment tasks to engage students in in-depth learning of subject matter and to promote their mastery of the 21st-century competencies (e.g., Darling-Hammond & Snyder, 2000 ; Koh, 2011a , 2011b , 2014 ; Shepard et al., 2005 ; Webb, 2009 ). Although many of the 21st-century competencies are not new, they have become increasingly in demand in colleges and workplaces that have shifted from lower-level cognitive and routine manual tasks to higher-level analytic and interactive tasks (e.g., collaborative problem solving) (Darling Hammond & Adamson, 2010 ). The amount of new information is increasing at an exponential rate due to the advancement of digital technology. Hence, rote learning and regurgitation of facts or procedures are no longer suitable in contemporary educational contexts. Rather, students are expected to be able to find, organize, interpret, analyze, evaluate, synthesize, and apply new information or knowledge to solve non-routine problems.

Students’ mastery of the essential 21st-century competencies will enable them to succeed in colleges, to thrive in a fast-changing global economy, and to live meaningfully in a complex, technological connected world. According to Darling-Hammond and Adamson ( 2010 ), the role of performance assessment is critical in helping both teachers and students to achieve the 21st-century standards of assessment and learning. Many authors in extant research have used “performance assessment” and “authentic assessment” interchangeably (e.g., Arter, 1999 ; Darling-Hammond & Adamson, 2010 ). Some authors have distinguished between performance assessment and authentic assessment (Meyer, 1992 ; Palm, 2008 ; Wiggins, 1989 ). Thorough review of the literature suggests that there is a need to differentiate performance assessment from authentic assessment.

All authentic assessments are performance assessments because they require students to construct extended responses, to perform on something, or to produce a product. Both process and product matter to authentic assessments, and hence formative assessment—such as open questioning, descriptive feedback, self- and peer assessments—can be easily incorporated into authentic assessments. In other words, the process is as important as the product. As such, authentic assessments also capture students’ dispositions such as positive habits of mind, growth mindset, persistence in solving complex problems, resilience and grit, and self-directed learning. The use of scoring criteria and human judgments are two of the essential components of authentic assessments (Wiggins, 1989 ).

Although all performance assessments include constructed responses or performances on open-ended tasks, not all performance assessments are authentic. As Arter ( 1999 ) pointed out, the two essential components of a performance assessment include tasks and criteria. This suggests that the line between performance assessment and authentic assessment is thin. Hence, the authenticity of a performance assessment or performance-based tasks is best to be determined by Gulikers et al.’s ( 2004 ) five dimensions of authenticity; Koh and Luke’s ( 2009 ) criteria for authentic intellectual quality; Newmann, Marks, and Gamoran ( 1996 ) “intellectual quality” criteria; and Wiggins’s ( 1989 ) four key features of authentic assessment. The dimensional framework proposed by Gulikers et al. is appropriate for use with assessments in professional and vocational training contexts including higher education institutions, while Wiggins ( 1989 ), Newmann et al. ( 1996 ), and Koh and Luke ( 2009 ) are appropriate for use with assessments in K‒12 school contexts. The criteria for authentic intellectual quality by Koh and Luke ( 2009 ) have also been linked to the Singapore Classroom Coding Scheme, which was developed by Luke, Cazden, Lin, and Freebody ( 2005 ) to conduct classroom observations of teachers’ instructional practices. Some of the criteria for authentic intellectual quality were adapted from Newmann et al.’s ( 1996 ) authentic intellectual work, Lingard, Ladwig, Mills, Bahr, Chant, & Warry’s ( 2001 ) productive pedagogy and assessment, and the New South Wales model of quality teaching (Ladwig, 2009 ). Lingard et al. ( 2001 ) have used the term “rich tasks” instead of authentic tasks in the Queensland School Reform Longitudinal Study. According to the authors, rich tasks are open-ended tasks that enable students to connect their learning to real-world issues and problems.

In short, this section presents a detailed explanation of the concept of authentic assessment. The remaining sections of this article will include a comparison of authentic assessment and conventional assessment, criteria for authenticity in authentic assessment, authentic assessment research in educational contexts (research problems/questions and methods included), and future research in authentic assessment.

Authentic Assessment Versus Conventional Assessment

Authentic assessment serves as an alternative to conventional assessment. Conventional assessment is limited to standardized paper-and-pencil/pen tests, which emphasize objective measurement. Standardized tests employ closed-ended item formats such as true‒false, matching, or multiple choice. The use of these item formats is believed to increase efficiency of test administration, objectivity of scoring, reliability of test scores, and cost-effectiveness as machine scoring and large-scale administration of test items are possible. However, it is widely recognized that traditional standardized testing restricts the assessment of higher-order thinking skills and other essential 21st-century competencies due to the nature of the item format. From an objective measurement or psychometric perspective, rigorous and higher-level learning outcomes(e.g., critical thinking, complex problem solving, collaboration, and extended communication) are too subjective to be tested. An overemphasis on objective measurement and closed-ended item formats has led to the testing of discrete bits of facts and procedures. As such, curriculum is fragmented and dumbed down as many of the desired learning outcomes are measured as atomized bits of knowledge and skills.

Standardized paper-and-pen tests are administered in uniform ways to ascertain student achievement for summative purposes (i.e., grading and reporting at the end of a unit or a semester, certification at the completion of a course). At the classroom level, standardized tests are typically used in summative assessment at the end of instruction. Assessment is seen to be detached from instruction. Large-scale administration of standardized paper-and-pen tests is often used for cross-national comparisons of student achievement. The use of standardized paper-and-pen tests on a large-scale basis is predominant in state/provincial assessments and international assessments. Examples of state/provincial assessments are the Foundation Skills Assessments (FSA) in British Columbia, Canada; the Provincial Achievement Tests (PAT) in Alberta, Canada; and the National Assessment of Educational Progress (NAEP) in the United States. International assessments include the Trends in Mathematics and Science Study (TIMSS); the Progress in International Reading Literacy Study (PIRLS); and the Program in International Student Assessment (PISA). The closed-ended item response format in standardized tests tends to encourage students to fill in the bubbles or provide short answers using their rote memorization of discrete facts and procedures. Students are either rewarded or punished depending on whether they get that one answer right according to the answer keys or marking schemes. Such a testing format is aligned with the behaviorist learning theory that promotes the use of rewards to reinforce positive behaviors and of sanctions to remove negative behaviors.

Both summative and international assessments are high stakes because student achievement data derived from these assessments are used for making important decisions or policies, which may lead to unintended consequences for students, teachers, or school administrators. Oftentimes, teacher job performance is evaluated based on student performance on high-stakes assessments. In many high-performative education systems, teachers are held accountable by policy makers, parents, and school administrators for students’ performance. Such a high accountability demand has led to teachers’ tendency to teach to the content and format of state/provincial, national, or international assessments. For example, Koh and Luke’s ( 2009 ) large-scale empirical study of the quality of teachers’ assessment tasks in Singapore, one of the high-performative education systems in the world, has shown that worksheets and summative tests were two of the most commonly used assessment methods in the teaching of core subject areas such as English, mathematics, and science at both elementary and secondary levels. Teachers’ instructional practices were driven by preparing students for high-stakes examinations. As a result, the intended curriculum was reduced to a drill-and-practice of decontextualized factual and procedural knowledge.

Authentic assessments are characterized by open-ended tasks that require students to construct extended responses, to perform an act, or to produce a product in a real-world context—or a context that mimics the real world. Examples of authentic assessments include projects, portfolios, writing an article for newsletter or newspaper, performing a dance or drama, designing a digital artifact, creating a poster for science fair, debates, and oral presentations. According to Wiggins ( 1989 ), authentic tasks must “involve students in the actual challenges, standards, and habits needed for success in the academic disciplines or in the workplace” (p. 706). In other words, authentic tasks need to be designed to replicate the authentic intellectual challenges and standards facing experts or professionals in the field. Such assessment tasks are deemed able to engage and motivate learners when they perceive the relevance of the tasks to the real world or when they find that a completion of the tasks is meaningful for their learning.

The purpose of authentic assessment is to provide students with ample opportunity to engage in authentic tasks so as to develop, use, and extend their knowledge, higher-order thinking, and other 21st-century competencies. Authentic tasks are often performance-based and include complex and ill-structured problems that are well aligned with the rigorous and higher-order learning objectives in a reformed vision of curriculum (Shepard, 2000 ). Most professional challenges in the current and future workplace require individuals to strike a balance between individual and group achievement (Wiggins, 1989 ). The nature of authentic tasks enables students to learn how to achieve such a balance by engaging in independent learning of possible solutions and by collaborating with peers in a socially supportive learning environment over an extended period of time. As such, authentic tasks also support problem-based learning, inquiry-based learning, and other learner-centered pedagogical approaches. Productive discourse or extended communication in a social context is important in the process of arriving at solutions to problems. Hence students are able to “experience what it is like to do tasks in workplace and other real-life contexts” (Wiggins, 1998 , p. 24). John Dewey, a prominent philosopher of education, underscored the importance of experience in education by arguing that learners cannot know something without directly experiencing it. Dewey inspired the use of the project method in his laboratory school at the University of Chicago from 1896 to 1904 . The project method enabled children to reflect and examine critically at their prior beliefs or preexisting knowledge in the light of new experiences. Children were expected to learn content knowledge and procedural skills in a context that was relevant to their real-world lives. The context usually entails a complex, real-life problem or authentic project, with many levels of embedded problems and solutions. The project method was further defined as a “hearty purposeful act” by Kilpatrick ( 1918 , p. 320) in his essay “The Project Method,” which became known worldwide.

Authentic tasks assess not only students’ authentic performance or work, but also their dispositions such as persistence in solving messy and complex problems, positive habits of mind, growth mindset, resilience and grit, and self-directed learning. Given that the use of scoring rubrics is a key component of authentic assessment, it enables the provision of descriptive feedback, self- and peer assessment using criteria and standards as in the form of holistic or analytic rubrics. It is important that students receive timely and formative feedback from the teacher and/or peers so that they are able to use the feedback to improve the quality of their performance or work. Such a formative assessment or assessment for learning practice has long been advocated in key assessment literature that urges teachers to use classroom assessment to support student learning or to promote a learner-centered classroom culture (e.g., Black & Wiliam, 1998 ; Shepard, 2000 ). From a social-constructivist learning approach (Shepard, 2000 ), the opportunities for productive discourse or dialogue in the process of collaborating with peers and of giving/receiving peer feedback in completing authentic tasks underscore the importance of co-construction of knowledge and meaning-making through socially supported interactions.

Since the 1990s, the social-constructivist learning theory has played a key role in the curriculum and assessment reform movement. The social-constructivist learning theory was named an emergent constructivist paradigm in Shepard’s ( 2000 ) reconceptualization of classroom assessment practice for the 21st century . The emergent constructivist paradigm was characterized by the shared principles of a reformed vision of curriculum, cognitive and constructivist/social-constructivist learning theories, and classroom assessment. The shared principles emphasize that all students can learn, and thus they must be given an equal opportunity to be exposed to intellectually challenging subject matter and assessment tasks that are aimed at developing their higher-order thinking, problem solving, and dispositions. The principles of classroom assessment in Shepard’s ( 2000 ) emergent constructivist paradigm are similar to those that characterize authentic assessment.

Criteria for Authenticity in Authentic Assessment

There is a substantial curriculum and assessment literature focusing on the features or characteristics of authentic assessment. The use of “features” and “characteristics” seems to suggest that an assessment or a task can be quantifiable for its authenticity. I prefer to use the term “criteria” to determine and describe the degree of authenticity of an assessment or a task. This section includes a review of the relevant literature on the criteria of authentic assessment.

According to Wiggins ( 1989 , 1998 ), assessment is central to learning and must be linked to real-world demands. In these articles, some of the criteria for authentic assessment are overlapping. They can be summarized into eight criteria:

First , authentic assessment “is realistic” (Wiggins, 1998 , p. 22). This means that the authentic task or tasks must replicate how a student’s knowledge, skills, and/or dispositions are assessed in a real-world context. In other words, the authentic task or tasks should replicate or simulate the real-world contexts in which adults are assessed in the workplace, in social life, and in personal life. This enables students to experience what it is like to work or perform in real-life contexts, which are often messy, ambiguous, and unpredictable. Such a “learning by doing” experience is in line with Dewey’s experiential education.

Second , the authentic task or tasks require students to make good judgments and be creative and innovative in solving complex and non-routine problems or performing a task in new situations. This enables the assessment of transferable skills to new tasks or contexts. In addition, students need to be competent and confident in using a repertoire of knowledge, skills, and dispositions to tackle and complete authentic tasks that are intellectually challenging. Hence, authentic tasks serve as an effective tool for assessing students’ demonstrations of critical thinking, complex problem solving, and creativity and innovation. These are some of the essential 21st-century competencies.

Third , an authentic assessment or task enables students to deeply engage in the subject or discipline through critical thinking and inquiry. Instead of rote learning and reproduction of facts and procedures, students need to be able to think, act, and communicate like experts in the subject or discipline. This is akin to Shulman’s ( 2005 ) signature pedagogies.

Fourth , in authentic assessment, students are given opportunities to rehearse, practice, look for useful resources, and receive timely quality feedback so as to improve the quality of performance or product. Students also need to present their work publicly and be given the opportunity to defend it. This suggests that assessment for learning or formative assessment practice can be easily incorporated into authentic assessment.

Fifth , authentic tasks look for multiple evidences of student performance over time and the reasons or explanations behind the success and failure of a performance. In addition, both reliability and validity of judgment about complex performance depend upon multiple evidences gained over many performances across multiple occasions. To ensure fairness and equity, the teacher must be provided with informative data of students’ strengths and weaknesses at the end of each assessment. This will ensure that the teacher’s feedback is aimed at helping all students to make progress toward the standards.

Sixth , a multifaceted scoring system is used, and scoring criteria must be transparent. Sharing of scoring criteria explicitly with students will enable them to understand and internalize the criteria of success.

Seventh , student self-assessment must play a pivotal role in authentic assessment.

Finally , the reliability or defensibility of teachers’ professional judgment or scoring of student performance or work is achieved through social moderation, in which teachers of the same subjects gather to set criteria and standards for scoring, and to compare their scores (Klenowski & Wyatt-Smith, 2010 ).

Authentic achievement rather than authentic assessment is used in Newmann and Archbald ( 1992 ). They identify three criteria or standards for authentic achievement, namely, construction of knowledge , disciplined inquiry , and value beyond school . In their later work, Newmann et al. ( 1996 ) and Newmann, Bryk, and Nagaoka ( 2001 ) have used the term “criteria for authentic intellectual work” instead of “standards for authentic achievement.” Definitions of the three criteria for authentic achievement are as follows:

Construction of Knowledge

This criterion clearly indicates that students need to engage in construction or production of knowledge instead of reproduction of knowledge. Construction of knowledge is expressed in written and oral discourse. Examples of construction of knowledge are writing an article for a newsletter, performing a musical piece of work, creating a poster for a science fair, completing a group project, and designing a digital portfolio. All of these authentic assessments require students to engage in higher-order thinking, problem solving, communication, and collaboration. At the same time, students also need to present and defend their work in public.

Disciplined Inquiry

This criterion suggests that students need to be actively involved in critical inquiry within academic subjects or professional disciplines. Disciplined inquiry consists of three main components: prior knowledge base , in-depth understanding , and elaborated communication (Newmann et al., 2001 , p. 15). Students’ authentic performance is built on their prior knowledge in a subject or discipline. To engage in critical inquiry, students need to be able to tap into their prior knowledge base or the content knowledge that they have acquired before. The prior knowledge base or previously learned content knowledge includes facts, terminologies, vocabularies, concepts, theories, algorithms, procedures, and conventions. In-depth understanding refers to the ability to probe deeper into a problem and to organize, interpret, analyze, evaluate, and synthesize different types of knowledge or information that can be used to solve the problem. In-depth understanding helps students to engage actively in intellectual discourse or in making extended communication to explain their solutions to the problem. All experts or professionals in a subject or discipline are expected to use sophisticated forms of written and oral communication (i.e., elaborated communication) to carry out their work and to express their solutions to problems.

Value Beyond School

This criterion underscores the importance of having a value dimension in assessment tasks. To be intrinsically motivating for students, authentic tasks must have aesthetic, utilitarian, or personal value in the eyes of the learner.

Newmann et al. ( 1996 ) have pointed out that all three of these criteria are necessary for assessing the authenticity of student performance across grade levels and subject areas. They aptly stated that “construction of knowledge through disciplined inquiry to produce discourse, products, or performance that have value beyond success in school can serve as a standard of intellectual quality for assessing the authenticity of student performance” (Newmann et al., 1996 , p. 287). However, they also cautioned that not all instructional activities and assessment tasks will meet all the three criteria at all times.

Building upon the three criteria of authentic achievement, Newmann et al. ( 1996 ) have further developed seven criteria for assessing the intellectual quality of assessment tasks. The criteria are organization of information , consideration of alternatives , disciplinary content , disciplinary process , elaborated written communication , problem connected to the world , and audience beyond the school . Organization of information and consideration of alternatives reflect the importance of assessing students’ higher-order thinking or critical thinking in solving real-world problems. Disciplinary content emphasizes students’ ability to engage in critical inquiry into the ideas, theories, and perspectives central to their academic subject or professional discipline, while disciplinary process refers to the ability to use sound methods of inquiry, research, and communication, which is central to their academic subject or professional discipline. The use of elaborated written communication suggests that authentic tasks must involve students in using extended communication or sustained writing to express deep understanding and problem solving. The last two criteria, namely, problem connected to the world and audience beyond the school , indicate that assessment tasks need to expose students to the real-world issues or problems that they encounter in their daily lives or are likely to encounter in their future colleges, workplaces, and lives.

Gulikers et al. ( 2004 ) have proposed five criteria for defining authentic assessment in the context of professional and vocational training. Similar to Wiggins ( 1989 ) and Newmann and Archbald ( 1992 ), they contend that authenticity of assessment is a multifaceted concept. In determining the authenticity of an assessment, there is a need to take into account students’ perceptions of authenticity. In other words, students’ perceptions of the meaningfulness or relevance of the assessment is central to the determination of authenticity. The five criteria for authenticity or dimensions of authenticity are task , physical context , social context , assessment form , and criteria (Gulikers et al., 2004 ). The criteria are summarized below:

Using Messick’s ( 1994 ) question of authentic to what , Gulikers et al. ( 2004 ) have argued that the degree of authenticity of an assessment or a task is measured against a criterion situation. According to them, “a criterion situation reflects a real-life situation that students can be confronted with in their work placement or future professional life, which serves as a basis for designing an authentic assessment” (Gulikers et al., 2004 , p. 75). Therefore, an authentic assessment task should resemble the complexity of the knowledge, skills, and dispositions required in the criterion situation. And students should see the relevance or meaning of their performances on the authentic task to their future professions. The degree of authenticity of an assessment task can further be determined by whether the task requires multiple solutions and whether it is ill-structured and involves multiple disciplines.

Physical Context

In this criterion, three components are identified by Gulikers et al. ( 2004 ) to determine the degree of authenticity of an assessment: similarity to the professional work space (fidelity), availability of professional resources (methods/tools/materials, relevant or irrelevant information), and time given to complete the assessment task. Sufficient time for the completion of a task is important so that students’ thinking and acting will not be restricted by time constraints. Many professional activities in real life involve planning and execution of tasks over an extended period of time.

Social Context

The social processes of an authentic assessment must resemble those of a professional context. If the professional context or real-life situation requires collaboration with peers in solving problems, then the assessment should also involve students in collaboration and problem solving. However, it is important to note that if a professional context or real-life situation typically requires individual work then the assessment should not enforce collaboration. In other words, fidelity of the social processes in authentic assessment to those in a real-life situation is essential.

Assessment Form

The authenticity of assessment form is determined by the degree to which students are observed for their demonstrations of competences when performing on a task or creating a product. The observation will enable an inference about students’ competences in future professional contexts. The authenticity of the form of assessment also depends on the use of multiple tasks and indicators of learning. This is similar to Wiggins’s ( 1989 ) multifaceted scoring system, which emphasizes the use of multiple evidences of student performance. Many measurement and assessment experts also advocate for the use of multiple methods or tasks and multiple indicators of learning to ensure the accuracy, fairness, reliability, and validity of professional judgment about student performance (Messick, 1994 ; Shavelson, Baxter, & Gao, 1993 ; Wiggins, 1989 ). Hence, students’ professional competence should neither be assessed by a single task nor be judged based on a single performance.

Scoring criteria used in authentic assessment should be based on criteria used in professional practice or a real-life situation. In addition, scoring criteria should concern the development of relevant professional competence, which means that assessment of students’ learning progression is an important practice in the context of authentic assessment. Similar to Wiggins ( 1989 ), Gulikers et al. ( 2004 ) have argued that scoring criteria must be transparent and be shared explicitly with students to facilitate their learning. Hence, criterion-referenced rubrics should be used to judge students’ performance or work in authentic assessment.

Research in Authentic Assessment

Since the 1990s, research in authentic assessment was focused on three themes: authentic assessment in educational or school reforms, teacher professional learning or development in authentic assessment, and authentic assessment as tools or methods used in a variety of subjects or disciplines in K‒12 schooling and in higher education institutions. Among these three themes, most studies were focused on the role of authentic assessment in educational or school reforms. Due to space limitations, only key studies concentrating on authentic assessment in educational or school reforms have been reviewed.

Authentic Assessment in Educational or School Reforms

Since the late 1990s, authentic assessment has become a key lever for educational or school reforms that aim to develop students’ 21st-century competencies and prepare them for a global knowledge-based economy in a technologically connected world. In the curriculum frameworks of many education systems, there is a shift from low-level learning outcomes (e.g., factual knowledge and procedural skills) to higher-order learning outcomes (i.e., higher-order thinking, problem solving, and other essential 21st-century competencies). Likewise, teachers have been urged to move toward the use of social-constructivist, learner-centered pedagogy, authentic assessment, and formative assessment. Such changes have resulted in a substantial body of research focusing on teachers’ assessment practices and building teachers’ capacity in classroom assessment.

In the United States, Newmann and his associates (Newmann et al., 1996 ; Newmann et al., 2001 ) have conducted empirical studies to examine the impact of authentic pedagogy on student performance in Chicago public elementary schools. The focus of Newmann et al.’s ( 1996 ) study was to determine the relationship between authentic pedagogy and student performance in schools that used authentic pedagogy as a school reform initiative. Authentic pedagogy was comprised of authentic instruction and authentic assessment based on the criteria for authentic intellectual work. The study involved teachers who taught mathematics and social studies in three different grades ranging from elementary schools to high schools. Data included classroom observations of the teachers’ daily lessons and analyses of the assessment tasks and students’ written responses to the tasks that were embedded within the lessons. The data were analyzed using the criteria for authentic intellectual work. Student responses to the assessment tasks were used as evidence of student performance.

Most studies on educational or assessment reforms have often used standardized test scores as an indicator of improved student learning even when an educational innovation involves a new form of assessment. Student responses to tasks or student work samples are embedded within teachers’ instructional practices and hence serve as a better indicator of student performance. Newmann et al. ( 1996 ) found that authentic pedagogy was strongly associated with students’ authentic academic performance at all grade levels in both mathematics and social studies. Students who were exposed to assessment tasks with high intellectual demands demonstrated higher authentic performance than students who did not have the same exposure. In addition, the effects of authentic pedagogy were found to be equitably distributed among students of diverse social backgrounds, indicating that all students should have an equal access to the standards of intellectual quality. The findings suggest that student performance is dependent on the quality of teachers’ assessment tasks, and authentic assessment can play a pivotal role to raise the quality of students’ learning and performance irrespective of their gender, ethnic group, and socioeconomic status. Authentic assessment can serve as a powerful mechanism to ensure equitable learning opportunities and outcomes for all students.

In a second study, Newmann et al. ( 2001 ) examined the effects of authentic assignments or assessments on students’ authentic intellectual work in the day-to-day classroom and students’ achievement in high-stakes standardized tests. Samples of classroom assignments were collected from 19 elementary schools in Chicago. The study involved approximately 5,000 students and their teachers in grades 3, 6, and 8. These grades were purposefully selected because of the relevance of using test scores from both the statewide and national testing programs. This allowed the researchers to “link teacher assignments both to student performance on state tests of reading, writing, and mathematics and to results from the national norm-reference tests of reading and mathematics” (Newmann et al. 2001 , p. 16). In addition to test scores, teacher assignments in writing and mathematics were analyzed for their intellectual demands. A group of teachers from the Chicago public schools were trained to judge the quality of teacher assignments using scoring rubrics that consisted of the criteria for authentic intellectual work. Newmann et al. ( 2001 ) found that when teachers organized instruction around authentic assignments, students not only produced more authentic, intellectually complex work but also gained greater scores in both statewide and national tests in reading and mathematics. Similar results were noted in some very disadvantaged classrooms. Newmann et al. ( 2001 ) also pointed out that the intellectual demands in teacher assignments or assessment tasks played a far more important role than a particular teaching strategy or pedagogical method to influence student engagement in learning. Hence, professional development for teachers should focus on their capacity in designing and using curriculum materials and classroom assessments that include high authentic intellectual challenge.

Newmann et al.’s ( 1996 ) work, originating in the United States, has been adapted and expanded in the Queensland School Reform Longitudinal Study (Lingard et al., 2001 ). The criteria for authentic intellectual work provided the basis for the Queensland model of productive pedagogies, assessment, and performance (Lingard et al. 2001 ). In Lingard et al.’s ( 2001 ) criteria for productive assessment, the three Newmann criteria of authentic intellectual work were extended to include knowledge criticism , technical metalanguage , inclusive knowledge , and explicitness of expectations as new indicators. Similar to Newmann et al.’s ( 1996 ) authentic pedagogy, productive pedagogies were intellectually demanding, connected to the real world, supportive of student learning, and diversity valuing. Lingard et al. ( 2001 ) found that the levels of intellectual or cognitive demand of teachers’ assessment tasks were positively associated with the quality of students’ performance as evidenced in students’ written work. This important finding has led to the New Basics trial of curriculum in grades 1‒9 in Queensland schools. The New Basics curriculum was aligned with productive pedagogies and rich tasks (i.e., authentic tasks). The trial yielded positive outcomes. As such, the use of rich tasks and teacher-moderated judgment of students’ work in response to rich tasks have become exemplary assessment practices in many Queensland schools. Such exemplary assessment practices are applauded by policy makers, school administrators, educators, and researchers around the globe. This has led to the Core 1 Pedagogy and Assessment project in Singapore (Luke, Freebody, Lau, & Gopinathan, 2005 ).

Both the Newmann et al. ( 1996 ) and Lingard et al. ( 2001 ) studies served as the basis for Koh and Luke’s ( 2009 ) study of Singaporean teachers’ assessment practices. As one of the world’s high-performing education systems, Singapore has launched a variety of educational reforms since the beginning of the 21st century . Like their counterparts in other developed countries, Singaporean teachers have been urged to implement new forms of assessment (i.e., authentic assessment and formative assessment) to capture higher-order learning outcomes in the intended curriculum. The Koh and Luke study was conducted to examine Singaporean teachers’ assessment practices as well as the quality of teachers’ assessment tasks and the quality of students’ work in grades 5 and 9 in seven subject areas: English, social studies, mathematics, sciences, Mandarin Chinese, Malay, and Tamil. It was the first large-scale empirical study of teachers’ assessment practices and the data were drawn from a representative sample of Singaporean classrooms. Following the framework of Newmann et al. ( 1996 ) and the work from Anderson and Krathwohl ( 2001 ), Marzano ( 1992 ), and Nitko ( 2004 ), Koh and Luke ( 2009 ) have devised nine criteria for assessing the quality of teachers’ assessment tasks and six criteria for assessing the quality of students’ work in response to the assessment tasks.

The nine criteria for assessment tasks were depth of knowledge , knowledge criticism , knowledge manipulation , sustained writing , task clarity and organization , connections to the real world beyond the classroom , supportive task framing , student control , and explicit performance standards or marking criteria . The six criteria for assessing the quality of students’ work included depth of knowledge , knowledge criticism , knowledge manipulation , sustained writing , quality of students’ writing or answers , and connections to the real world beyond the classroom (Koh, 2011a ).

Brief descriptions of the criteria are as follows:

Depth of Knowledge

According to the revised Bloom’s taxonomy of intended student learning outcomes, there are three types of knowledge, namely, factual knowledge, procedural knowledge, and advanced concepts or conceptual knowledge (Anderson & Krathwohl, 2001 ). Factual knowledge is knowledge of discrete and decontextualized content elements (i.e., bits of information), while procedural knowledge entails knowledge of using discipline-specific skills, rules, algorithms, techniques, tools, and methods. Conceptual knowledge involves knowledge of complex, organized, and structured knowledge forms (e.g., how a particular subject matter is organized and structured, how the different parts or bits of information are interconnected and interrelated in a more systematic manner, and how these parts function together). All three types of knowledge are essential for student learning.

Knowledge Criticism

Based on models of critical literacy and critical pedagogy, knowledge criticism is a predisposition to the generation of alternative perspectives, critical arguments, and new solutions or knowledge (Luke, 2004 ). Knowledge criticism enables students to judge the value, credibility, and soundness of different sources of information or knowledge through comparison and critique rather than to accept and present all information or knowledge as given.

Knowledge Manipulation

Knowledge manipulation calls for an application of higher-order thinking and reasoning skills in the reconstruction of texts, intellectual artifacts, and knowledge. It involves organization, interpretation, analysis, synthesis, and/or evaluation of different sources of knowledge or information (Anderson & Krathwohl, 2001 ). Authentic assessments or tasks should provide students with more opportunities to make their own hypotheses and generalizations in order to solve problems, arrive at conclusions, or discover new meanings, rather than only to reproduce information expounded by the teacher or textbooks, or to reproduce fragments of knowledge and preordained procedures.

Sustained Writing

This criterion aims to gauge the degree to which the assessment task requires and generates production of extended chunks of prose. Authentic assessments or tasks must ask students to elaborate on their nuances/understanding, explanations, arguments, or conclusions through the generation of sustained written prose.

Task clarity and organization , student control , and explicit performance standards or marking criteria are conceptualized based on Marzano’s ( 1992 ) learning-centered instruction. The assumption here is that the explicitness of the procedures and criteria for the assessment task provides clear goals and explicit criteria and language for the assessment of value. The incorporation of these criteria into the classroom assessment provides students with ample opportunity to engage in formative assessment or assessment for learning, which contributes to their self-directed learning, independent learning, and critical thinking.

Task Clarity and Organization

The assessment task is framed logically and has instructions that are easy to understand so that students will not have misinterpretations and missing information. The written instructions, guidelines, worksheets, and other textual advanced organizers must be clear and well organized.

Connections to the Real World Beyond the Classroom

This criterion assesses the degree to which the assessment task and affiliated artifacts were connected to an activity, function, or task in a real-world situation.

Supportive Task Framing

Teachers’ scaffolding of an assignment or assessment task—that is, providing some structure and guidance—can assist students to accomplish a complex task (Nitko, 2004 ). There are three types of scaffolding: content, procedural, and strategic. For highly intellectual tasks, teachers should place more emphasis on strategic scaffolding.

Student Control

Teachers provide students with the opportunity to determine the parameters of a task such as topics or questions to answer, alternative procedures, tools and resources to use (e.g., textbook, Internet, or newspaper), length of writing or response, or performance or marking criteria.

Explicit Performance Standards/Marking Criteria

The assessment task is provided with the teacher’s clear expectations for students’ performance and the marking criteria are made explicitly clear to the students. Reference to only technical or procedural requirements (e.g., the number of examples, length of an essay or response) is not taken as evidence of explicit performance standards or marking criteria. This criterion underscores the importance of sharing scoring criteria with students explicitly, which is also a key criteria espoused by Wiggins ( 1989 ) and Gulikers et al. ( 2004 ). To ensure fairness and equity, students need to know in advance the specific and differentiated criteria for what may count as “value,” quality, or success at completion of the task.

Given that the theoretical underpinnings of the criteria for assessing the quality of students’ work are similar to those for teachers’ assessment tasks, they will not be repeated here. Readers who are interested in the criteria and indicators used to judge the quality of teachers’ assessment tasks and students’ work across different subject areas can refer to Koh ( 2011a ).

Future Research: The Remaining Questions

In the context of professional and vocational training, Gulikers, Bastiaens, Kirschner, and Kester ( 2008 ) have argued that the notion of authenticity is subjective and students’ perceptions of the authenticity of an assessment or a task can influence the quality of their learning. Their study has shown that there is a difference between teachers’ and students’ perceptions of assessment authenticity. As such, it is important to take into account students’ perceptions of meaningfulness or relevance of an assessment or a task to their real-life situations. Further, this finding also supports another crucial aspect of authentic assessment task design, that is, students must be involved in the process of determining and negotiating the assessment or task parameters (i.e., student control).

There has been a substantial body of research in teacher professional learning and development in classroom assessment or formative assessment. Many of them have focused on formative assessment or assessment for learning and models of effective professional development. Koh ( 2011b ) has conducted a two-year intervention study with a group of elementary teachers in Singapore, to examine the effects of ongoing, sustained professional development in authentic assessment task design on the teachers’ assessment literacy, specifically teachers’ capacity in designing and implementing authentic assessment tasks. To enhance teachers’ understanding and internalization of the criteria for authentic intellectual quality in designing authentic tasks, only five of the key criteria were used in Koh’s ( 2011b ) study: depth of knowledge , knowledge criticism , knowledge manipulation , sustained writing or extended communication , and connections to the real world beyond the classroom . The study has demonstrated positive results in improving teachers’ assessment literacy through ongoing, sustained professional development in authentic assessment task design in English, mathematics, and science at the elementary school level. In addition, in-depth interviews with the participating teachers have shown that their conceptions of authentic assessment have greatly improved toward the end of the two-year professional development. In a second study, Koh, Burke, Luke, Gong, and Tan ( in press ) found that Chinese language teachers had difficulty to incorporate certain knowledge manipulation criteria into their assessment tasks despite a quick grasp of the design principles of authentic assessment.

Webb ( 2009 ) has called for professional development in mathematics education to focus on “helping teachers to develop a ‘designers’ eye’ for selecting, adapting, and designing tasks to assess student understanding” (p. 3). Although the term “authentic assessment” was not directly used by Webb ( 2009 ), we can make inferences that authentic assessment is the most effective way of assessing student understanding across different subjects or disciplines.

Given that teachers need to have a “designers’ eye” (Webb, 2009 , p. 3) or to be critical and intelligent consumers of high-quality authentic assessment or performance assessment, it is important for professional development and teacher education programs to provide both inservice and preservice teachers with ample opportunity to engage in authentic assessment task design and analysis of student work. For future research, the remaining questions should focus on building teachers’ capacity in authentic assessment and assessment for learning through a critical inquiry approach in school-based professional learning community or in teacher education programs. According to Wyatt-Smith and Gunn ( 2009 ), the critical inquiry approach refers to teachers’ ability to reflect on and understand the assessment processes and practices in actual sociocultural contexts in relation to four important lenses: (1) conceptions of the knowledge domains and competencies to be assessed; (2) conceptions of the alignment between assessment, teaching, and learning, and its enactment in practice; (3) teacher judgment practices in relation to standards, assessment task design, student work samples, and social moderation; and (4) curriculum literacies or discipline-specific language demands. To enable the power of authentic assessment to unfold in the classrooms of the early 21st century , it is essential that teachers are critical designers and reflective practitioners of classroom assessment tasks that support student learning and mastery of the 21st-century competencies.

Teachers’ capacity to design and implement authentic assessment is of paramount importance in the current era of competency-based education. In fact, authentic assessment has been used in International Baccalaureate programs and has also been incorporated into school-based assessments in several high-performing nations on PISA. The nations are Singapore, Hong Kong, Finland, and Australia. However, it is worth noting that the success of authentic assessment initiatives can be hindered by changes in school leadership or governmental policies. For example, the No Child Left Behind (NCLB) Act of 2001 in the United States and the National Assessment Program—Literacy and Numeracy (NAPLAN, 2008 ) in Queensland, Australia have posed challenges to school-based, teacher-moderated assessment due to an overemphasis on “back to basics” and high-stakes accountability testing of students’ academic achievement.

In 2010 , the launch of the Common Core Standards in the Unites States brought significant changes in curriculum, assessment, and instruction. The standards define the 21st-century knowledge, skills, and dispositions students should have mastered within their K‒12 education so that they are well prepared for achieving their academic and careers aspirations as well as personal well-being in an increasingly complex and competitive world. Ideally, the Common Core Standards have created opportunities for the development and implementation of authentic assessments or performance assessments in English language arts and mathematics. However, a heavy focus on the use of student assessment data for accountability purposes has led to a push back from state governments, teachers, and parents. A lack of teacher autonomy in the design and use of assessments to help students achieve the 21st-century educational outcomes has defeated the original purpose of the Common Core. Hence, it is important for policy makers in the US and other countries to model the Finnish education system, in which teachers are given full autonomy to develop and implement classroom assessments that support student learning.

Bibliography

  • Anderson, L. W. , & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives . New York: Longman.
  • Archbald, D. , & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school . Reston, VA: National Association of Secondary School Principals.
  • Arter, J. (1999). Teaching about performance assessment. Educational Measurement: Issues and Practice , 18 (2), 30–44.
  • Black, P. , & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice , 5 (1), 7–74.
  • Cumming, J. J. , & Maxwell, G. S. (1999). Contextualising authentic assessment. Assessment in Education: Principles, Policy & Practice , 6 (2), 177‒194.
  • Darling-Hammond, L. , & Adamson, F. (2010). Beyond basic skills: The role of performance assessment in achieving 21st century standards of learning . Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.
  • Darling-Hammond, L. , & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and Teacher Education , 16 , 523‒545.
  • Gulikers, J. T. M. , Bastiaens, T. J. , & Kirschner, P. A. (2004). A five-dimensional framework for authentic assessment. Educational Technology Research and Development, 52(3), 67–86.
  • Gulikers, J. T. M. , Bastiaens, T. J. , Kirschner, P. A. , & Kester, L. (2008). Authenticity is in the eye of the beholder: Student and teacher perceptions of assessment authenticity. Journal of Vocational Education & Training , 60 (4), 401‒412.
  • Kilpatrick, W. H. (1918). The project method. Teachers College Record , 19 , 319–335.
  • Klenowski, V. , & Wyatt-Smith, C. (2010). Standards, teacher judgement and moderation in contexts of national curriculum and assessment reform. Assessment Matters , 1 , 84–108.
  • Koh, K. (2011a). Improving teachers’ assessment literacy . Singapore: Pearson Education South Asia.
  • Koh, K. (2011b). Improving teachers’ assessment literacy through professional development. Teaching Education , 22 (3), 255‒276.
  • Koh, K. (2014). Authentic assessment, teacher judgment and moderation in a context of high accountability. In C. Wyatt-Smith , V. Klenowski , & P. Colbert (Eds.), Designing assessment for quality learning (Vol. 1, pp. 249‒264). Dordrecht, The Netherlands: Springer.
  • Koh, K. , & Luke, A. (2009). Authentic and conventional assessment in Singapore schools: An empirical study of teacher assignments and student work. Assessment in Education: Principles, Policy & Practice , 16 (3), 291‒318.
  • Koh, K. , Burke, L. E. C. A. , Luke, A. , Gong, W. , & Tan, C , (in press). Developing the assessment literacy of teachers in Chinese language classrooms: A focus on assessment task design. Language Teaching Research .
  • Ladwig, J. (2009). Working backwards towards curriculum: On the curricular implications of quality teaching. Curriculum Journal , 20 (3), 271‒286.
  • Lingard, B. , Ladwig, J. , Mills, M. , Bahr, M. , Chant, D. , & Warry, M. (2001). The Queensland School Reform Longitudinal Study . Brisbane: Education Queensland.
  • Luke, A. (2004). Two takes on the critical. In B. Norton & K. Toohey (Eds.), Critical pedagogies and language learning (pp. 1–14). Cambridge, U.K.: Cambridge University Press.
  • Luke, A. , Cazden, C. , Lin, A. , & Freebody, P. (2005). A coding scheme for the analysis of Singapore classrooms. Technical paper. Singapore: Centre for Research in Pedagogy and Practice.
  • Luke, A. , Freebody, P. , Lau, S. , & Gopinathan, S. (2005). Towards research-based innovation and reform: Singapore schooling in transition. Asia Pacific Journal of Education , 25 (1), 5–28.
  • Marzano, R. J. (1992). A different kind of classroom: Teaching with dimensions of learning . Alexandria, VA: The Association for Supervision and Curriculum Development.
  • Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessment. Educational Researcher , 23 (2), 13–23.
  • Meyer, C. (1992). What’s the difference between authentic and performance assessment? Educational Leadership , 49 (8), 39–40.
  • Newmann, F. M. , & Archbald, D. A. (1992). The nature of authentic academic achievement. In H. Berlak , F. M. Newmann , E. Adams , D. A. Archbald , T. Burgess , J. Raven , & T. A. Romberg (Eds.), Toward a new science of educational testing and assessment (pp. 71–84). Albany: State University of New York Press.
  • Newmann, F. M. , Bryk, A. S. , & Nagaoka, J. K. (2001). Authentic intellectual work and standardized tests: Conflict or coexistence? Improving Chicago’s schools . Chicago: Consortium on Chicago School Research.
  • Newmann, F. M. , Marks, H. M. , & Gamoran, A. (1996). Authentic pedagogy and student performance. American Journal of Education , 104 (4), 280–312.
  • Nitko, A. J. (2004). Educational assessment of students (4th ed.). Upper Saddle River, NJ: Pearson/Merrill Prentice Hall.
  • Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature . Practical Assessment, Research & Evaluation , 13 (4), 1‒10.
  • Shavelson, R. J. , Baxter, G. P. , & Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement , 30 (3), 215–232.
  • Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher , 29 (7), 4‒14.
  • Shepard, L. , Hammerness, K. , Darling-Hammond, L. , Rust, F. , Snowden, J. B. , Gordon, E. , … Pacheco, A. (2005). In L. Darling-Hammond & J. Bransford (Eds.), Preparing teachers for a changing world (pp. 275–326). San Francisco: John Wiley.
  • Shulman, L. S. (2005). Signature pedagogies in the professions. Daedalus , 134 (3), 52–59.
  • Webb, D. C. (2009). Designing professional development for assessment. Educational Designer , 1 (2), 1–26.
  • Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan , 70 (9), 703‒713.
  • Wiggins, G. (1998). Educational assessment: Designing assessments to inform and improve student performance . San Francisco: John Wiley.
  • Wyatt-Smith, C. , & Gunn, S. (2009). Towards theorising assessment as critical inquiry. In J. J. Cumming & C. Wyatt-Smith (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 83–101). London: Springer.

Related Articles

  • Accountabilities in Schools and School Systems

Printed from Oxford Research Encyclopedias, Education. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 02 May 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|185.80.151.41]
  • 185.80.151.41

Character limit 500 /500

  • Reference Manager
  • Simple TEXT file

People also looked at

Review article, assessment and technology: mapping future directions in the early childhood classroom.

scholarly articles educational assessment

  • 1 School of Education and Professional Studies, Griffith University, Southport, QLD, Australia
  • 2 Rightpath Research and Innovation Center, Child & Family Studies, University of South Florida, Tampa, FL, United States
  • 3 School of Applied Psychology, Griffith University, Southport, QLD, Australia

The framework and tools used for classroom assessment can have significant impacts on teacher practices and student achievement. Getting assessment right is an important component in creating positive learning experiences and academic success. Recent government reports (e.g., United States, Australia) call for the development of systems that use new technologies to make educational assessment more efficient and useful. The present review discusses factors relevant to assessment in the digital age from the perspectives of assessment for learning (AfL) and assessment of learning (AoL) in the early childhood classroom. Technology offers significant avenues to enhance test administration, test scoring, test reporting and interpretation, and link with curriculum to individualize learning. We highlight unique challenges around issues of developmental appropriateness, item development, psychometric validation, and teacher implementation in the use of future assessment systems. However, success will depend upon close collaboration between educators, students, and policy makers in the design, development, and utilization of technology-based assessments.

Introduction

Assessment plays an important role in the teaching-learning process and it is a powerful tool for enhancing student achievement and facilitating societal progress ( Broadfoot and Black, 2004 ; Hodges et al., 2014 ). In this twenty-first century, innovative technologies have the potential to deliver better quality educational assessments that are more useful for teachers and that more readily benefit student learning ( Koomen and Zoanetti, 2018 ). This view is echoed by Gonski (2018) who urges educators to “use new technology not for its own sake, but to adopt ways of working that are more efficient and effective” (p. 99). Beyond commonplace technologically-supported survey methodologies, numerous new technologies offer exciting opportunities for educational assessment. These include touch screens with drag and drop and multi-touch features, augmented reality (AR), virtual reality (VR), mixed reality (MR), robots, and behavioral monitoring (e.g., voice recognition, eye gaze, face recognition, touchless user interface). It is at this nexus where innovative education theory, psychology, computer science, and engineering can combine to optimize classroom assessment practices and provide clear links between assessment, teaching, and learning.

The present review examines technology in classroom assessment from the perspective of students, educators, and administrators. Classroom assessment refers to a practice wherein teachers use assessment data from a variety of tools or products to document and enhance student learning ( Randel and Clark, 2013 ). While commonly used tools include teacher-made tests, the current review focuses on externally produced standardized tests by national, state, and district level assessment developers as well as commercial developers. Assessment can be conceptualized in two ways: as facilitating the learning process and as summarizing the current state of knowledge in students. Technology has the potential to enhance both applications. Moreover, technology offers significant advantages across the different stages of assessment, from test administration to linking data to the curriculum. However, concerns in using technology-based assessment have also been raised around developmental appropriateness, item development, psychometric validation, and teacher training. The present review examines these issues, with a focus on technology-based assessment for education in the early years. The early childhood classroom, for the purposes of this review, includes kindergarten and the preparatory year. In some regions, early childhood may also refer to the 2 years prior to and the year following kindergarten. Following an overview of assessment processes in education, we examine the use of technology in assessment before concluding with future areas in need of development.

Understanding Assessment Processes in Education

In the educational context, assessment is broadly conceptualized as an ongoing process of gathering evidence of learning, interpreting it, and acting on this evidence to improve future learning and performance ( Stiggins, 2002 ; Bennett, 2011 ). In this respect, assessment is understood as a social-cultural practice or activity ( Broadfoot and Black, 2004 ; Looney et al., 2018 ; Silseth and Gilje, 2019 ). It is embedded in the teaching and learning process which is mediated by the tools used in assessment. Furthermore, the processes used in assessment are closely linked with the social interaction of learners and teachers, with the construction of knowledge achieved by a novice-expert relationship. The quality and individualized feedback to students is also integral to the process ( Sadler, 1989 ; Heritage, 2007 ). As such, assessment that incorporates both social and individualized perspectives is likely to help student learning ( Hodges et al., 2014 ). Successful assessment systems of the future will closely embody the needs and perspectives of teachers and their students.

The application of assessment within this broader framework generally falls within three categories, namely diagnostic assessment, summative assessment, and formative assessment. These three types are distinguished by their purposes, timing, to whom they are administered, and in test construction and design. However, there can be instances when the same test is used for more than one application, which may not necessarily be appropriate if the test was not designed for this. Diagnostic assessments are designed to thoroughly assess achievement in a given domain and all relevant subdomains. Diagnostic reading tests, for example, assess children's phonological awareness, graphophonemic knowledge, reading fluency, and reading comprehension. Diagnostic tests are administered to individuals who are struggling to learn or who have been deemed at-risk of academic failure. Results from well-designed diagnostic tests help inform educators and special educators what to teach and how to teach. Because diagnostic tests are usually designed to classify students and to determine access to special services, they are rigorously developed and administered in ways that assure that the test scores and their interpretation have high degrees of reliability, validity, and fairness. As such, they are lengthy and often require some expertise on the part of the assessor.

Summative assessments are designed to quantify how much one has achieved to date in a given academic domain and their purpose is assessment of learning (AoL). Summative assessments are standardized tests that are usually administered to all students in a given grade, school, school district, state, or country. AoL occurs at a specific point in time where achievement to date is to be quantified. This is typically at the end of an academic school year, completion of a course, or immediately following an intervention program. Examples include final exams, school district administered standardized tests, the Graduate Record Examinations (GRE), the National Assessment of Educational Progress (NAEP), and the National Assessment Program Literacy and Numeracy (NAPLAN) [ Australian Curriculum, Assessment and Reporting Authority (ACARA), 2013 ]. Results from summative assessments may be shared with students, parents, teachers, administrators, and evaluators. These consumers use the indices of overall student achievement to make evaluative judgements against predetermined standards. In recent years, AoL is being increasingly used for high stakes accountability purposes ( Stiggins, 2002 ; Heritage, 2007 ). For example, in much of the United States, AoL data are used to rank order public schools, determine teacher and principal salaries, decide whether to retain or terminate principals and school district administrators, determine the need for third party takeover of public schools, and defund publicly funded early childhood education programs ( Darling-Hammond, 2004 ; Neal, 2011 ).

Formative assessments are designed to efficiently measure how well students are responding to instruction in a specific subdomain of achievement and to indicate if instructional modifications are warranted. Their purpose is assessment for learning (AfL). AfL does not aim to quantify overall achievement. Instead, its purpose is to generate data useful for guiding instruction. That is, AfL has a focus on the integration of assessment activities into the teaching and learning process. In AfL, test results provide immediate feedback to teachers and students about how much of the recently taught material has been learned. The results are used by teachers to inform lesson planning ( Sadler, 1989 ). Wiliam (2011) notes that educators who use formative assessment must have a strong understanding of what the learner knows, where the learner is going, and how to get there. The feedback afforded to educators and students through AfL serves to guide the learner through individualized teaching approaches that optimize student learning ( Wiliam, 2011 ). It helps students improve as they work to attain higher levels of performance to create new knowledge and highlights the important relationship between classroom assessment practice, learning, and use of assessment evidence to guide instruction.

Early identification, targeted instruction, monitoring of children's learning, and data-driven instructional changes are key components of programs that close achievement gaps. AfL takes many forms and can inform each of these components. For example, for educators to provide targeted instruction, a student's mastery of taught skills and their (sub)domain specific learning must be regularly assessed to determine progress toward desired outcomes. Skills mastery tests, traditionally called curriculum-based measures, are one form of AfL and tests assess the extent to which a child has learned specific skills taught in a given curriculum. Skills mastery tests are brief, closely linked to the curriculum, and administered frequently (e.g., weekly spelling tests). Students' performance on skills mastery tests helps educators appropriately pace their progress through a given curriculum. These tests are necessary but not sufficient for guiding instruction because mastery of a particular skill does not necessarily lead to mastery of that academic domain or subdomain ( Fuchs, 2004 ; Shapiro et al., 2004 ; VanDerHeyden, 2005 ). For example, a student who can read “-at” word families may still have difficulty reading a passage that incorporates a variety of rhymes and word structures.

A useful AfL approach includes both mastery tests and General Outcome Measures (GOM; Deno, 1985 , 1997 ). GOMs are broader in item content than mastery tests, and they are not usually linked to a specific curriculum. GOMs are usually administered to all students in a classroom, grade, or school district at predefined increments of time. For example, universal benchmarking often occurs three or four times per school year. GOMs are also administered more frequently to those students who are receiving more frequent or more intensive intervention. The potential strengths of GOMs include brevity and ease of administration, alternate forms that allow frequent re-administration, sensitivity to learning, and implications for grouping children and modifying instruction. These assets make GOMs a fitting approach for monitoring students' progress and evaluating their responsiveness to instruction. GOMs help teachers evaluate students' level and rate of achievement, determine needs for instructional change, set appropriate short- and long-term goals, and monitor progress relative to peers or criterion-based benchmarks ( Shapiro et al., 2004 ; VanDerHeyden, 2005 ; Busch and Reschly, 2007 ). Thus, GOMs have come to the forefront of educational assessment with the emergence of response to intervention (RTI) frameworks for service provision and identification of children with learning difficulties.

RTI is a framework for linking AfL to instruction through data-based problem-solving. RTI includes an effective core curriculum, increasingly intense tiers of instruction for underperforming students, integrated assessment including universal screening, benchmarking, mastery tests, and progress monitoring, and use of assessment results to guide instruction. RTI can be implemented by teachers and when done so improves student outcomes ( Fuchs et al., 1984 , 1989 ; Graney and Shinn, 2005 ; Heritage, 2007 ; VanDerHeyden et al., 2007 ) and is satisfying for teachers ( Hayward and Hedge, 2005 ). A meta-analysis reported impressive mean effect sizes of 1.02 for field-based studies and 1.54 for university-based studies evaluating RTI implementations ( Burns et al., 2005 ). Practice guides from the U.S. Department of Education's Institute of Education Science (IES) conclude that there is strong evidence for effectiveness of RTI ( Gersten et al., 2008 , 2009 ).

Application of New Technologies for Assessment

The application of technology may provide one avenue for resolving the intricacies of classroom assessment in the twenty-first century. Research between assessment and classroom learning help to refine technology-based supports and theoretical models of assessment, teaching, and learning processes ( Black and Wiliam, 1998 ; Heritage, 2018 ). To develop the next generation of technology-based assessments, test developers will need to consider the perspectives of policy makers interested in content standards, teachers interested in AoL and AfL, and assessment experts interested in the results collected ( National Research Council, 2010 , p. 21). The use of technology in classroom assessment promises advanced features not possible with paper-and-pencil tests, such as faster student feedback and computer-generated next steps that allow teachers to make real-time data-driven decisions to inform their instructional changes. In order to realize such insightful and sophisticated technology, attention to student-centered assessment and instructionally tractable assessments is highly recommended ( Russell, 2010 ; Wiliam, 2010 ). A collaborative approach to test development will improve the implementation process for using computer-based assessments in the classroom.

There are other ways that test developers can advance knowledge in areas such as early childhood classroom assessment, such as designing assessments that align with the five dimensions of innovation for computerized tests ( Parshall et al., 2000 ). The field is ripe for exploration in the area of design features for children, such as item formats, response action, media inclusion, interactivity, and use of scoring algorithms. Research on computer use in young children is still in its infancy and empirical research is newly emerging ( Clements and Sarama, 2003 ; Labbo and Reinking, 2003 ; Chen and Chang, 2006 ; Schmid et al., 2008 ). Technology can be used to enhance children's learning experience in the classroom, which is also expected to prepare active and informed citizens in a competitive global economy [ Ministerial Council on Education, Employment Training, and Youth Affairs (MCEETYA), 2008 ]. The development of innovative computer-based assessments for children will require a rich understanding of developmentally appropriate design features, content expertise, implementation science, measurement, and an understanding for what students and teachers need.

Developmental Appropriateness

The digital age has initiated a generational shift where children are increasingly likely to have openhanded access to technology. Approximately two-thirds of USA citizens now own a smartphone ( Pew Research Center, 2015 ) and ongoing research suggests that even some children from low-income, minority communities have near universal access to mobile devices ( Ojanen et al., 2015 ). The American Academy of Pediatrics (AAP) currently recommends that children younger than 18 months should avoid screen media and children ages 2 to 5 should limit their screen time to 1 h per day of quality programs ( American Academy of Pediatrics, 2016 ). While the research evidence on child and technology use continues to grow, studies on children's computer interventions have demonstrated promise in areas like language and literacy ( Lankshear and Knobel, 2003 ; Burnett, 2010 ; Neumann, 2018 ; Neumann et al., 2019 ). Arguably the biggest factor relating to developmental appropriateness is the nature of the technology itself.

Research has repeatedly shown that young children can experience difficulty manipulating a computer mouse when performing drag and drop sequences due to their limited motor skills, eye-hand coordination, and the size of their hand relative to the mouse ( Joiner et al., 1998 ; Hourcade et al., 2004 ; Donker and Reitsma, 2007 ). Instead, the use of touch screen tablets in education and assessment for young children is recommended. Touch screen tablets can be used by young children and children with special needs who may lack the fine motor skills to effectively use a standard keyboard or mouse ( Neumann and Neumann, 2018 ). Using multimodal features, touch screen devices offer opportunities to administer tests in ways that can facilitate the assessment process ( Lu et al., 2017 ).

With the widespread use of touch screen devices, feasibility research on the developmental appropriateness of children's tablet use is underway. Early findings suggest that 2-year-olds can perform tap and drag gestures when using touch screen devices, and 3-year-olds can tap, drag, free rotate, as well as drag and drop ( Aziz et al., 2014 ). Touch screen tablets offer different ways for students to interact with the screen and thus allow for test items to conform to many different item types. Children can use their fingers to draw, tap to highlight objects, swipe objects away, tap and drag objects to other places on the screen, pinch to zoom in and out, twist to rotate objects, and scroll up and down a screen. This physical interaction can also create a testing situation that is more engaging for children than traditional paper-and-pencil tests ( Woloshyn et al., 2017 ).

As children develop their fine motor skills and advance to writing, there is also the capability to assess handwriting using a stylus pen. A stylus pen allows children to create shapes and letters and form lines of different thickness when pressure is applied on a digital surface. Research shows that children can easily manipulate the stylus for drawing and writing and are engaged by the activity ( Chang et al., 2005 ; Matthews and Seow, 2007 ). Falk et al. (2011) demonstrated the feasibility of measuring children's handwriting by using a Wacom Intuous 3 digital tablet and a custom-built pen. These digital tools measured spatial, temporal, and grip force parameters. In their sample of first and second graders, static grip was associated with lower legibility. These input methods offer a variety of ways to appropriately assess multitouch gestures and handwriting skills in older students.

Assessing toddlers using touch screen technology is the new frontier. Twomey et al. (2018) found that children as young as 2-years-old can complete a cognitive assessment using a touch screen device. A range of touch screen technologies are already being developed and applied in classroom assessment as the preferred response action. For example, a tablet is used in the Profile of Phonological Awareness (PRO-PA) in which it provides an interface for the teacher to ask questions and enter in student responses ( Carson, 2017 ). A tablet is also used in the validated Emergent Literacy Assessment app (ELAa) which plays pre-recorded audio to ask questions and uses a touch screen interface to collect responses from the child ( Neumann and Neumann, 2018 ; Neumann et al., 2019 ). Future research is needed to enhance developmentally appropriate features of tablets to improve digital assessment experiences for young children.

Item Development

Technology-based assessments offer more variety in stimulus presentation than is available with paper-based test booklets or flip books. Touch screen tablets, computers, and virtual modalities have multimodal features to give students opportunities to strengthen learning, motivation, collaboration, engagement, and productivity and can be used for multiple formats in assessment ( Woloshyn et al., 2017 ). The use of technology promises improved measurement of higher-order understanding and performance because of its flexibility in integrating media and exploring new item types. The criticism against current state-level assessments are that they rely heavily on multiple choice items thereby suggesting a lack of rigor ( National Research Council, 2010 ). There is a vocal disenchantment with multiple choice items due to a reported overreliance on measuring factual knowledge rather than higher-level skills ( Pellegrino and Quellmalz, 2010 ). A proclivity for multiple choice items in assessments has an overarching effect in the classroom as well and research suggests that teachers are more likely to rely on multiple choice items in their classrooms when year-end assessments do too ( Abrams et al., 2003 ). Nevertheless, multiple choice test items are more efficient than open-ended items ( Jodoin, 2003 ), easier and cheaper to develop ( Stecher and Klein, 1997 ), equitable for children of different backgrounds ( Bruder, 1993 ), and can be refined to measure higher-level skills ( Parshall et al., 2000 ). For emergent readers, multiple choice items can be designed as multiple-choice graphics. The use of technology-enhanced test items in early childhood assessments is still largely untapped and a delicate balance between innovation, cost, and efficiency is needed when designing items.

To increase the rigor of multiple-choice items, test developers and teachers can use multiple choice variants. By using multiple response items, children must choose more than one answer choice to get the item right. By using ordered response items, children must choose the correct sequence for an event. Technology can enhance and facilitate the administration of these and other types of items with the use of touch screen technology as they can be used in multiple formats. For example, students can touch a hot spot on a graphic as their answer choice ( O'Neill and Folk, 1996 ; Parshall et al., 1996 , 2008 ; Scalise and Gifford, 2006 ; Becker et al., 2011 ). Students can also highlight texts for assessment purposes ( Davey et al., 1997 ) and be assessed on their drawing and mark making abilities ( Scalise and Gifford, 2006 ; Kopriva and Bauman, 2008 ; Boyle and Hutchison, 2009 ; Dolan et al., 2010 ). Drag and drop features can be used to select and move objects, order objects, connect objects, and sort objects. The limits of traditional item types can be further explored with the use of touch screen technology, and in addition can be enhanced by integrating media-based features (e.g., sounds, animations).

Adapting a multiple-choice paper-based test into a computerized format is a natural evolution when transitioning to technology-based assessments, but there is a growing call for greater innovation. The use of media such as graphics, audio, and video are ideal for emergent readers who are not yet fluent readers. Recent and future advances in behavioral monitoring (e.g., eye gaze, face recognition, touchless user interface) offer exciting opportunities for even more diverse ways that students may demonstrate their learning. For example, group administered expressive tests may become a possibility to the extent that voice recognition software advances to accommodate dialectal differences and multilingual influences on articulation and tone. Similarly, gesture recognition and facial expression recognition provide additional non-verbal modalities to help reduce the reliance on verbal skills common to many traditional assessment approaches.

Incorporation of movement via animation or video clips readily supports assessment of verbs on vocabulary tests, which have always been difficult to elicit from static illustrations on traditional paper-based assessments. A study on computer-based storytelling in kindergarteners found that computer administered stories using animation, video, sounds, and music were more effective at supporting language development than computer administered stories using still images ( Verhallen et al., 2006 ). Augmented reality (AR), virtual reality (VR), and mixed reality (MR) presentations offer highly engaging stimulus presentation in the foreground, experimental control of the background, and truly interactive means of responding. The interactive computer tasks of the future will include multiple modes of assessment and there is headway being made in the area of K-12 science assessment. Opportunities to develop interactive computer tasks should be taken when these offer advantages to static assessment modes. For example, items can be developed to indicate slow motion, scenarios that are invisible to the naked eye, hazardous situations, and the manipulation of objects ( National Assessment Governing Board, 2014 ). Linking students within the same virtual environment through avatars may also offer the potential to assess skills requiring teamwork, cooperation, and communication.

Psychometric Validation

As new online assessment systems and related educational policies are introduced in many countries around the world (e.g., Australia, USA), it is essential that rigorous test development work, piloting of the technologies with students and educators, and testing of infrastructure are conducted prior to large scale rollout. For example, when former paper-based assessments are transitioned to digital platforms, developers must attend to comparability of the two versions in terms of content, psychometrics, construct validity and scoring. Research suggests that differences in test administration mode are more significant than the content itself in affecting test outcomes ( Bridgeman et al., 2003 ; Pommerich, 2004 ). Therefore, it is recommended that test developers conduct comparability studies of their tests at the total score level and the item level to ensure score equivalency across their paper and computerized tests. A substantial number of test items need to be produced in anticipation that extensive field testing and post-administration analysis will result in the reduction of problematic items.

Also, quality concerns should be urgently addressed in terms of the reliability and validity of the test scores ( Nickerson, 1989 ; Koomen and Zoanetti, 2018 ) to ensure teacher and public confidence ( Broadfoot and Black, 2004 ). Assessments used in education should be subject to the same rigorous validation processes as any other cognitive assessments used for psychological diagnostic purposes (e.g., intelligence tests). In this respect, there are psychometric properties of educational assessments that are particularly important.

Measurement invariance refers to the property that an assessment is stable across different groups (e.g., gender, cultural groups). Technology-based assessments should demonstrate the same invariance, which can be important given that some groups of children will have different exposure to technological devices than other groups (e.g., greater access to tablets and other devices for children from higher socioeconomic families). In a related concept, the assessment of the same construct (e.g., phonological awareness) should demonstrate invariance regardless of what technology is used or even if traditional paper-and-pencil methods are used.

Another psychometric property is test-retest reliability, which is particularly important for applications in AfL. The stability of a measure over repeated measurements is essential if teachers are to infer changes as a result of learning and plan future lessons accordingly. New technologies are needed that also allow in-depth analysis at more granular levels. Whereas, error analysis usually takes extensive time by the test administrator outside of the testing context, automated error analysis via computerized scoring is almost immediate. Focus groups with teachers have found they desire this granular level of reporting because it helps them identify learning gaps and plan accordingly ( Landry et al., 2017 ).

Test Administration

Technology-based assessments have the potential to make the process of administration more standardized, efficient, and offer diverse ways in which children can demonstrate their knowledge. Computer administered tests with automated scoring improves ease of use and minimizes administration and scoring related errors by teachers ( Foorman et al., 2008 ). Some of these advantages can be witnessed even when tests are administered individually, and students' answers are entered directly into a computer by a test administrator. Online tablet testing also offers several practical advantages over (online) desktop computer testing. They are compact and mobile which allows them to be utilized in a range of contexts. Children can use them on a desk or when sitting on the floor and can carry them around the classroom allowing increased flexibility for test settings and provides greater choice for individual student preferences. However, teachers will need to provide support to young children so that they are properly engaging with the computer ( Hitchcock and Noonan, 2000 ; Ellis and Blashki, 2004 ) which will improve their use of technology over time ( Klein et al., 2000 ).

Nevertheless, the use of a consistent set of instructions and the ability to enter responses directly into the database leads to increases in efficiency and standardization. Fully computerized applications that automatically present training items, instructions, and test items, and that automatically gather student responses, optimize standardized administration. This important benefit avoids otherwise inevitable variations among test administrators like in timing and dialect, which can invalidate test scores in some types of educational assessments like tests of phonological awareness, listening comprehension, and mathematical problems. Efficiency gains are the most often cited advantage for the use of technology-based assessments in the classroom. While the automatic presentation and collection of student responses is a commonly cited example, there are other ways that technology improves efficiency.

Computer adaptive testing (CAT) is a method of administering tests that adapts to an examinee's ability ( Wainer, 1990 ). CAT interacts with the examinee by selecting items that maximize precision of the test based on what is known about a student from his or her prior responses. Test administration is individualized as item difficulty is made easier or harder following incorrect or correct items, respectively. The tailoring of items is performed using item selection algorithms such as multidimensional adaptive testing ( Luecht, 1996 ; Segall, 1996 ). Adaptive testing reduces the need to administer all the items to all children, thereby saving time. Shortening of the test may increase student engagement, thereby also increasing the degree of accuracy ( Olson, 2002 ). CAT selects items from large item pools and tests of different lengths may be administered based on user input concerning the level of precision in scores desired in accord with the purpose of the testing. CAT is however criticized for not allowing users to review or change their answers once they have responded ( Wise, 1996 , 1997 ; Pommerich and Burden, 2000 ). A solution to this would be to adopt testlet-based CAT which adaptively administers a subset of items, or a “mini-test,” rather than item by item to users ( Wainer and Kiely, 1987 ; Wainer and Lewis, 1990 ). CAT is well-suited for efficient benchmarking and progress monitoring because subsequent administrations resume where prior administrations terminated. Examples of children's CAT include the STAR Reading assessment ( Renaissance Learning, 2015 ) and the Smarter Balanced Assessment Consortium (2018) that monitor children's progress on a large-scale.

An efficient and intuitive assessment system is an enormous time saver for administration and provides the foundation for advanced reporting features, and is key to user satisfaction. For example, a variety of means are now available to electronically import students' names, external identifiers, grades, birthdates, sex, ethnicity, free and reduced lunch status, special education status, and English language learner status. Many state education agencies (SEA) and local education agencies (LEA) in the USA consider functionality for bulk upload of student details as a prerequisite for purchase of any new technology-based assessment products. The most user-friendly assessment systems also allow importing of teacher, school, and district information and follow with an intuitive means for administrators at each level to specify the roles and relations among students, classes, teachers, special educators, school administrators, district administrators, and SEA administrators. These specifications are used by the technology to assign user access privileges to assure that each user only has security access to appropriate data. The demographic data and information stored and optionally edited in the user management system support the scoring and reporting functionalities, as different graphical views of data at different levels of aggregation can be provided to administrators, teachers, special educators, interventionists, and parents.

Test Scoring

In many cases, one of the major benefits of technology-based assessments is their ability to automate the collation and scoring of assessment data. This digitized or computerized scoring process enhances efficiency and accuracy. This is achieved by no longer requiring humans to perform data entry, calculate raw scores, transfer scores, search and locate the appropriate look-up tables, calculate domain scores, and perform a number of score conversions (e.g., raw score to age score, raw score to grade score, raw score to ability score, raw score to norm referenced standard score). Beyond provision of improved accuracy and efficiency of common scoring processes, technology-based assessments exclusively offer the ability to capitalize on modern psychometric models.

The scoring of most traditional educational assessments assumes that all items on a given test are equally able to index the construct of interest. However, this assumption is rarely supported by statistical analysis. For example, two parameter logistic (2PL) item response models better explain performances on tests of phonological awareness ( Anthony et al., 2002 , 2011 ), oral language ( Anthony et al., 2014 ) and letter knowledge ( Anthony, 2018 ) than do one parameter logistic (1PL) models. Computerized scoring can weight items by their discriminations, creating estimated ability scores with greater precision. Moreover, only computerized scoring can incorporate the most advanced psychometric models that are becoming more common in educational measurement (e.g., three parameter logistic models, graded response models). Most exciting on the horizon are psychometric item response models that consider both accuracy and latency data in estimation of student abilities, which of course, only computerized scoring could accommodate. Scoring of student assessment data that considers both accuracy and latency of student responses may in turn have different instructional implications.

Test Reporting and Interpretation

Making sense of assessment results and using them appropriately are some of the biggest challenges faced by educators and administrators whose formal education does not typically include advanced coursework in measurement. This is another area in which technology-based assessments offer significant advantages over traditional educational assessments. New technologies can support interpretation of results with tabular reports and graphical plots of an individual's learning rate relative to a variety of pertinent reference groups. Data should be reported to educators in a way that optimizes their interpretability by considering the latest research on educators' statistical literacy, such as following established standards ( Rankin, 2016 ). Important referent groups that help contextualize a given student's learning include the average learning rates of peers in a small group, classroom, grade, school, school district, and national norms. Student level reports may also be shared electronically with parents, if included in the user management system. Otherwise, traditional parent reports may be printed and distributed via mail or discussed during parent/teacher conferences. Reporting using digitized platforms can be enhanced through user-friendly visual graphics, graphs, and tools to track learning over time. These individualized digital reports and records will benefit teachers, parents, students, and schools. For example, if students move schools locally, nationally, or internationally their digital reports can be easily accessible and travel with them. Such digital enhancements will help inform the student's new teachers and schools of their current competencies and learning goals.

To further support educators and administrators, reporting and data visualization can occur at higher levels of aggregation, e.g., small group, classroom, grade, school, and school district. Moreover, for those systems based on very large normative samples with links to child and school level demographics, demographically adjusted reporting may also be available. This is particularly relevant for schools and school districts, for example, that serve high proportions of students with economically disadvantaged backgrounds or special needs and those that serve high proportions of dual language learners. There is also the potential that widespread electronically administered and stored assessments can create big databases to inform education policy reports and practice. However, although 95% of education leaders indicated that big data technology allows greater in-depth knowledge about student learning, many schools are slowly transitioning to cloud computing and mobile technologies ( Harvard Business Review Analytical Services, 2017 ). Technology can assist with reporting which is essential for AfL purposes. As noted by Gonski (2018 , p. 62), nationally administered standardized tests “provide a useful ‘big picture’ view of student learning trends across Australia and the world, but have limitations at the classroom level: they report achievement rather than growth …. Teachers need to have useable data about each student at their fingertips as the basic prerequisite for improving learner outcomes.” These barriers limit reporting and interpretation and a greater focus on using data reporting to support individualized learning is needed.

Links to Curriculum and Individualized Learning

Technology offers much promise for supporting educators in making data-driven instructional modifications. For example, technology may help educators set realistic instructional goals that simultaneously consider a student's current proficiency level, his or her predicted growth rate, demographic characteristics of the student, the standard error of predicted growth rate in light of test reliability and number of data points, and normative growth rates. Students' progress toward individual goals and normative benchmarks can be evaluated at each progress monitoring wave and instructional modifications made if necessary. For example, the Texas Kindergarten Entry Assessment (TX-KEA) is used by teachers as a school readiness screener in domains like listening comprehension in English and in Spanish ( Anthony et al., 2017 ). Preschoolers listen to the prompts on headphones and answer the questions by touching a colorful illustration presented in a multiple choice format. TX-KEA also includes multiple response items that require children to touch multiple illustrated objects to get the item correct. As the field of child and technology use advances, the developmental appropriateness of technology will need to be considered beyond the early years. This is particularly the case for the more sophisticated technologies, like VR, AR, and MR technologies that are intuitive, commonplace, and authentic to the assessment process will provide more individualized educational data for teachers and students for planning learning experiences.

Some new technology-based assessments, like the Texas Kindergarten Entry Assessment, guide educators through RTI by provision of small group recommendations based on children's achievement profiles and recommendations to specific lessons in a curriculum or links to supplemental online instructional materials ( Landry et al., 2017 ). Theoretically, computer generated instructional recommendations and links to supplemental instructional materials could be based on both error analysis and achievement profiles. Effectively taking the guesswork out of RTI is a significant provision of technology-based assessment, especially for the early childhood education workforce that is sometimes less formally schooled in linking assessment to instruction.

The recent Gonski (2018) review panel recommended the development of “a new online and on demand student learning assessment tool based on the Australian Curriculum learning progressions” (p. 66). In the context of education in the USA, Stiggins (2010 , p. 763) argues that “our national assessment priority should make certain that assessments of both of and for learning are accurate in their depiction of student achievement and are used to benefit students” and recommends an online approach that includes both Afl and AoL to support teacher planning and student learning. In application, this knowledge will allow teachers to collect and share data in ways that will allow for the systematic creation of new learning experiences for students, facilitate transitions, and to evaluate the effectiveness of education policies, programs, and teaching practices. Digital links to the curriculum will enable efficient and accurate transfer of student outcome evaluations to tangible and effective learning experiences that support student's progression within their Zone of Proximal Development ( Heritage, 2007 ).

Furthermore, technology also offers promise for supporting administrators in making data-driven changes. For example, classroom level aggregated reports may help school level administrators make decisions about allocation of limited professional development resources, limited curricular resources, and school wide supplemental services. School level and district level aggregated reports may similarly support district level administrators and state level administrators make decisions about professional development needs, curricular needs, supplemental programming needs, and topics to address with new education policies.

Teachers and Implementation

While evidence for the benefits of AfL approaches continues to mount, research is needed around the optimal design of all forms of assessments—diagnostic, summative, and formative—and how new technologies can enhance the use of these complementary tools. Evidence centered design (ECD) is gaining ground as an assessment design and developmental framework for incorporating authentic and interactive tasks. The framework is an iterative design process that covers design, student performance, data, and test delivery with the aim of producing cost-effective tasks with clear links to the target construct ( Mislevy et al., 2003 ). New technologies also make the design of accessible products a reality for all children, with an understanding of their abilities and preferences. The adoption of universal design can minimize the need for test accommodations by designing products that are accessible to children regardless of disabilities ( Salend, 2009 ). Furthermore, as tablets are relatively inexpensive some schools now require that children purchase a tablet as part of a BYOD (bring your own device) program in much the same way as calculators had to be purchased. The lower price increases the potential for widespread use and applications of tablets by students and teachers. Ultimately, a student and teacher centered approach will help guide research and practice on optimal assessment design.

Teachers are being increasingly encouraged to implement new technologies in their classroom assessment practices. This pressure comes from the promise that the technologies can better meet changing stakeholder expectations, fulfill new assessment purposes, be engaging for students, deliver timely and informative results, and are flexible and efficient in administration and scoring ( Bennett, 2011 ; Gonski, 2018 ; Koomen and Zoanetti, 2018 ). These high expectations will continue to be unmet until teachers are provided with adequate training on sound classroom assessment practices and the use of technology. Research suggests that having higher levels of assessment knowledge leads to increased use of a variety of assessment tools in the classroom ( Bailey and Heritage, 2008 ; Popham, 2009 ).

Following initial applications, it was acknowledged that technology-based approaches to assessment presents challenges at the classroom and school levels ( Stiggins, 2002 ; Heritage, 2007 ). Since that time, the increased use of technology has resulted in a better understanding of how to meet these challenges. Teachers must be given significant pedagogical guidance to understand new assessments and ensure school engagement and participation in use of new assessment processes ( Looney et al., 2018 ; Van der Kleij et al., 2018 ). For example, teachers must be confident and comfortable applying consistent scoring procedures to collect AfL and AoL data from assessments that are clearly aligned with curriculum and instructional objectives. Communication of assessment must also be understood clearly by students, parents, and caregivers. An important aspect of this communication is delineating the relationship between assessment and learning, and research is needed to refine sociocultural assessment theory in the context of online and mobile technologies ( Baird et al., 2017 ).

Teachers play a key role in administering assessment and using data to inform planning for teaching and learning. As is stated in the Australian Professional Standards for Teachers (Standard 5), teachers are required to “Assess student learning, provide feedback to students on their learning, make consistent and comparable judgements, interpret student data, and report on student achievement” [ Australian Institute for Teaching and School Leadership (AITSL), 2011 p. 16–17]. Teachers are also expected to develop, select, and use AfL strategies to assess student learning and provide timely effective and appropriate feedback relative to their learning goals. This approach is also reflected in position statements on assessment in USA schools. For example, in the 2001 report of the Committee on the Foundation of Assessment of the National Research Council- Recommendation 11: “The balance of mandates and resources should be shifted from an emphasis on external forms of assessment to an increased emphasis on classroom formative assessment designed to assist learning” ( Stiggins, 2002 , p. 763). For assessments to be effective, teachers must have a sophisticated level of knowledge of both curriculum and AfL practices ( Van der Kleij et al., 2018 ). However, many teachers are unprepared in the use of AfL practices ( Stiggins, 2002 ; Lopez and Pasquini, 2017 ) and assessment is often perceived by teachers as high stakes, rank focussed, and “something that is in competition with teaching, rather than as an integral part of teaching and learning” ( Heritage, 2007 , p. 140). Also, due to the increasing use of new technology in classrooms, more teacher knowledge is needed to understand the complex relationship between AfL and AoL. Without this knowledge, teachers are likely to avoid adopting new assessment practices, which may otherwise be of benefit ( Stiggins, 2002 ).

Teachers are the front-line professionals who have the responsibility for facilitating teaching and assessment. As such, teachers would benefit from professional development activities that assist them to gain sophisticated knowledge of both the curriculum and AfL practices ( Stiggins, 2002 ; Heritage, 2007 ). For example, teachers need support to plan and implement quality assessment tasks, interpret evidence, develop outcomes appropriate to assessment purpose and type, generate feedback, report, and engage students as active participants in their assessment and learning ( Looney et al., 2018 ). AfL is often not well-understood by teachers ( Deluca et al., 2012 ), is not strong in practice and many teachers are unprepared to make summative judgements ( Lopez and Pasquini, 2017 ).

Furthermore, teachers can be challenged by conflicts among their belief systems, institutional structures and pressure from external accountability testing ( Black and Wiliam, 1998 ; Dwyer, 1998 ). However, it has been found that teachers with stronger confidence were better at AfL in the classroom ( Allinder, 1995 ), suggesting that enhanced self-efficacy with assessment tools and practices can be of benefit. We need to work with teachers to help them maximize the use and benefits of assessment technologies in the classroom. However, while most teachers possess general knowledge about using new technologies in the classroom, some experience uncertainties about their capabilities for meaningfully integrating tablets, computers, and mobile devices into the classroom for teaching, assessment, and tracking student progress ( Woloshyn et al., 2017 ). Clear evidence-based pathways are needed for the smooth transition from traditional (paper-and-pencil) to technology-based assessment so that teachers can seamlessly integrate technology into existing approaches and take advantage of technology (e.g., tablets/iPads) and its flexibility and mobility ( Lu et al., 2017 ).

Clearly, barriers still need to be overcome in order to allow the seamless use and implementation of assessment technologies in the classroom. Time is needed to help teachers build AfL and technology skills, reflect, interpret, and develop formative assessment materials to suit their students' learning needs. Provision of professional development will assist teachers in becoming proficient and confident users of AfL practices to effectively track student progress ( Lu et al., 2017 ), and use assessment technologies ( Dwyer, 1998 ; Woloshyn et al., 2017 ). The importance of investing in pre-service training for competency in assessing student learning must continue by policy makers committed to investing in in-service professional development in a coordinated approach that provides teachers the expertise and support they need ( Stiggins, 2002 ; Heritage, 2007 ).

Indeed, it has been found that successful use of education technologies (e.g., to prepare students for state assessments) in the classroom is dependent upon consistent, extensive and quality teacher professional development programs ( Penuel et al., 2007 ; Martin et al., 2010 ). Long term (2 years) professional development programs that assist teachers in integrating technology for teaching and learning can change practice, support the learning of new technologies, and show how technology can help students achieve learning goals ( Lawless and Pellegrino, 2007 ). It is important to also consider that teacher professional learning needs the emotional involvement of teachers to embrace change and new assessment innovations. The creation of collaborative support networks that shifts traditional assessment mindsets of all stakeholders is critical in raising teacher knowledge about AfL and will in turn foster student confidence in themselves as learners. Only then will a positive pathway to student growth and achievement be recognized ( Wiliam, 2011 ).

Future Considerations for Technology-Based Educational Assessment

Digital technologies have the potential to be a powerful assessment tool for teachers and students ( Woloshyn et al., 2017 ). Ultimately, both educators and administrators have the role to effectively integrate new technologies into the classroom. However, further work is required if the potential advantages of technology-based assessments are to be realized. It is important that research is conducted to build an evidence base from which to establish best practices. Furthermore, there are some issues that are particularly salient for the introduction and use of technology-based assessments. These include developmental appropriateness of the technology, ensuring that test scores are valid and reliable, and ensuring that teachers are supported in the use of technology for assessments. From the perspective of educators and administrators preparing for universal testing as part of either AoL or AfL, the assessment process actually begins with documenting basic student demographic characteristics that are relevant for later scoring and interpretation of the results. This is followed by test administration, test scoring, test reporting, and test interpretation, through which the results are fed back to students and teachers. Teachers can then use this data to monitor learning and inform the design of individualized learning experiences that are linked to the curriculum. Students must be inherently involved in the process of assessment which is viewed as the continuous flow of information between the student and teacher where the process of learning and growth is of central focus ( Stiggins, 2002 ).

Despite increases in education funding, little improvement has been made and in some cases a drop in the achievement of school children in some countries has occurred ( Productivity Commission, 2016 ; Gonski, 2018 ). To realize an improvement in education outcomes, a report on the National Education Evidence Base , called for the immediate need for refinement and efficiency in the collection and use of education and learning data refinement ( Productivity Commission, 2016 ). The report called for developing the “bottom-up” capability of teachers' use of assessment data and for expanded use of technology in assessment. The present review has highlighted ways in which technology can be used in meaningful ways to enhance assessment processes in classrooms. Technology has the potential to standardize and simplify test administration, to automate test scoring, to create reports that make use of new measures of learning, to customize reports, to deliver reports to a range of stakeholders, to aid in the interpretation of results across different levels of expertise and perspectives, to link assessment to curriculum, be used in ways to inform lesson planning, and used to monitor growth in learning over time.

Countries such as the Commonwealth of Australia and the United States of America are making significant strides in assessing their students at the national level. In the USA there is the National Assessment of Educational Progress (NAEP) (2018) and in Australia there is the National Assessment Program Literacy and Numeracy (NAPLAN) (2018) . Despite these feats, there is not yet standardized national assessments in any content area for children of preschool or kindergarten age. Although there are already examples of technology being used for classroom assessments of young learners, there remains a critical need to move education further into the digital age. In the USA, there is a race to fund innovative state assessment systems, such as computer adaptive assessments, that can provide an annual summative determination, validate when students are ready to demonstrate mastery, and allow for differentiated student support based on the individual learning needs ( Every Student Succeeds Act, 2015 ). The Every Student Succeeds Act (ESSA) shifts its focus on multiple measures rather than a single measure in order to shed further light on the teaching and learning cycles in classroom assessment. The integration of technology in instruction and assessment is still at a nascent stage for both countries, but there is swift transformation taking place.

One of those transformations is the operational definition of technology, a crosscutting concept that touches all aspects of modern life. When children are asked to define technology, they mention computers and electrically powered devices ( Pearson and Young, 2002 ; Lachapelle et al., 2018 ), Australia and the USA are working to broaden the scope of the general population's definition of technology. In alignment with the National Research Council (2010) , the National Assessment Governing Board (2017) , and the Australian Curriculum, Assessment and Reporting Authority (ACARA) (2016) , technology is defined as all products and processes that solve a problem, need, desire, or opportunity. Australia has gone so far as to declare that technology is the foundation for success in all learning areas [ Ministerial Council on Education, Employment Training, and Youth Affairs (MCEETYA), 2008 ]. Both countries are placing special emphasis on information communication technologies (ICT) because of its crucial role in the workforce. ICT includes the accessing, managing, integrating, and presenting of information [ ICT Literacy Panel, 2001 ; State Education Technology Directors Association, 2003 ; Ministerial Council on Education, Employment Training, and Youth Affairs (MCEETYA), 2008 ]. As both countries lead the way in national assessment, the area of technology-based classroom assessment in early childhood education is set to reap the benefits.

In this respect, technology-based assessment has a particularly strong potential to advance AfL practices in the early childhood years, although it can also enhance AoL and diagnostic assessment practices. At present, most of the focus of technology-based assessment in the early childhood years classroom has been from grade 1 and onwards. Assessments are also being increasingly used in earlier years, where the initial motivations have been to screen for school readiness and for benchmarking. Technology-based assessments as a part of AfL practices in kindergarten and preschool needs further research and development. Observations that children as young as 2–3 years can use a tablet to provide a valid assessment of cognitive skills ( Twomey et al., 2018 ) and early literacy skills ( Neumann and Neumann, 2018 ; Neumann et al., 2019 ) supports this work. Moreover, the development of ways to support the bottom-up capability of teachers to collect AfL and AoL data using digital technology in the classroom is critical. Rigorous testing of the validity and reliability of test scores produced by digital assessment tools in the classroom will allow children's knowledge to be collected efficiently, cost-effectively, and with accuracy. This approach will increase our knowledge of how technologies are used in assessment, how data can be linked across curriculum areas and across students, and how to represent data for education purposes to achieve individual learning goals and student success.

The challenge of making technology-based educational assessments a part of good educational practice can only be met through the joint efforts from a range of stakeholders. It will depend on investments in research to establish a strong evidence base for practice as well as the further research and development of new technology and new uses for existing technology. To be successful, this work will need strong university-industry partnerships and the support of government education departments at the local, state, and national levels. The outcome could see the establishment of digital ecosystems that involve educators, students, and other stakeholders in the design, development, and utilization of practical and utilitarian technology-based assessments. In turn, this could lead to increased efficiency and improved educational outcomes for students across all age levels. The development of committed collaboration among educational researchers, the technology industry, governments, and policy developers are needed to ensure the advantages of technology-based assessment are fully realized.

Author Contributions

Idea for review conceived by MN. Research, writing, and revision of manuscript jointly contributed by MN, JA, NE, and DN. Submission by DN.

The research reported here was supported in part by an International Collaboration Award by the College of Behavioral and Community Sciences at the University of South Florida and by the Institute of Education Sciences, U.S. Department of Education, through award number R305A170638. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abrams, L., Pedulla, J., and Madaus, G. (2003). Views from the classroom: teachers' opinions of statewide testing programs. Theor. Into Pract. 42, 18–29. doi: 10.1207/s15430421tip4201_4

CrossRef Full Text | Google Scholar

Allinder, R. M. (1995). An examination of the relationship between teacher efficacy and curriculum based measurement and student achievement. Remed. Spcl. Educ. 27, 141–152. doi: 10.1177/074193259501600408

American Academy of Pediatrics (2016). American Academy of Pediatrics Announces New Recommendations for Children's Media Use . Retrieved from: https://www.aap.org/en-us/about-the-aap/aap-press-room/Pages/American-Academy-of-Pediatrics-Announces-New-Recommendations-for-Childrens-Media-Use.aspx

Google Scholar

Anthony, J. L. (2018). “Dimensionality of English letter knowledge across names, sounds, case, and response modalities,” in Paper Presented at the Annual Meeting of the Society for the Scientific Study of Reading (Brighton).

Anthony, J. L., Davis, C., Williams, J. M., and Anthony, T. I. (2014). Preschoolers' oral language abilities: a multilevel examination of dimensionality. Learn. Individ. Diff. 35, 56–61. doi: 10.1016/j.lindif.2014.07.004

Anthony, J. L., Lonigan, C. J., Burgess, S. R., Driscoll, K., Phillips, B. M., and Cantor, B. G. (2002). Structure of preschool phonological sensitivity: overlapping sensitivity to rhyme, words, syllables, and phonemes. J. Exp. Child Psychol. 82, 65–92. doi: 10.1006/jecp.2002.2677

PubMed Abstract | CrossRef Full Text | Google Scholar

Anthony, J. L., Williams, J. M., Durán, L., Gillam, S., Liang, L., Aghara, R., et al. (2011). Spanish phonological awareness: dimensionality and sequence of development during the preschool and kindergarten years. J. Educ. Psychol. 103, 857–876. doi: 10.1037/a0025024

Anthony, J. L., Williams, J. M., Erazo, N. A., Montroy, J. J., and Cen, W. (2017). Texas Kindergarten Entry Assessment (TX-KEA): Listening Comprehension Subtest. Houston, TX: University of Texas Health Science Center at Houston and Texas Education Agency.

Australian Curriculum Assessment Reporting Authority (ACARA). (2013). National Assessment Program—Literacy and Numeracy . Retrieved from: https://www.nap.edu.au/

Australian Curriculum Assessment and Reporting Authority (ACARA). (2016). The Australian Curriculum: Digital Technologies . Sydney, NSW: ACARA.

Australian Institute for Teaching School Leadership (AITSL) (2011). Australian Professional Standards for Teachers . Retrieved from: https://www.qct.edu.au/pdf/QCT_AustProfStandards.pdf

Aziz, N. A. A., Sin, N. S. M., Batmaz, F., Stone, R., and Chung, P. W. H. (2014). Selection of touch gestures for children's applications: repeated experiment to increase reliability. Int. J. Adv. Comput. Sci. Appl. 5, 97–102. doi: 10.14569/IJACSA.2014.050415

Bailey, A. L., and Heritage, M. (2008). Formative Assessment for Literacy, Grades K-6: Building Reading and Academic Language Skills Across the Curriculum . Thousand Oaks, CA: Corwin Press.

Baird, J., Andrich, D., Hopfenbeck, T. N., and Stobart, G. (2017). Assessment and learning: fields apart? Assess. Educ. 24, 317–350. doi: 10.1080/0969594X.2017.1319337

Becker, D., Bay-Borelli, M., Brinkerhoff, L., Crain, K., Davis, L., Fuhrken, C., et al. (2011). Top Ten: Transitioning English Language Arts Assessments. Iowa City, IA: Pearson.

Bennett, R. E. (2011). Formative assessment: a critical review. Assess. Educ. 18, 5–25. doi: 10.1080/0969594X.2010.513678

Black, P., and Wiliam, D. (1998). Assessment and classroom learning. Assess. Educ. 5, 7–74. doi: 10.1080/0969595980050102

Boyle, A., and Hutchison, D. (2009). Sophisticated tasks in e-assessment: what are they and what are their benefits? Assess. Eval. Higher Educ. 34, 305–319. doi: 10.1080/02602930801956034

Bridgeman, B., Lennon, M. L., and Jackenthal, A. (2003). Effects of screen size, screen resolution, and display rate on computer-based test performance. Appl. Meas. Educ. 16, 191–205. doi: 10.1207/S15324818AME1603_2

Broadfoot, P., and Black, P. (2004). Redefining assessment? The first ten years of Assessment in Education. Assess. Educ. 11, 7–26. doi: 10.1080/0969594042000208976

Bruder, I. (1993). Alternative assessment: putting technology to the test. Electron. Learn. 12, 22–23.

Burnett, C. (2010). Technology and literacy in early childhood educational settings: a review of research. J. Early Childhood Literacy 10, 247–270. doi: 10.1177/1468798410372154

Burns, M., Appleton, J., and Stehouwer, J. (2005). Meta-analytic review of responsiveness-to-intervention research: examining field-based and research-implemented models. J. Psychoeduc. Assess. 23, 381–394. doi: 10.1177/073428290502300406

Busch, T. W., and Reschly, A. L. (2007). Progress monitoring in reading: using curriculum based measurement in a response-to-intervention model. Assess. Effect. Interv. 32, 223–230. doi: 10.1177/15345084070320040401

Carson, K. (2017). Reliability and predictive validity of preschool web-based phonological awareness assessment for identifying school-aged reading difficulty. Commun. Disord. Q. 39, 259–269. doi: 10.1177/1525740116686166

Chang, Y. M., Mullen, L., and Stuve, M. (2005). Are PDAs pedagogically feasible for young children? Examining the age-appropriateness of handhelds in a kindergarten classroom. Technol. Horizons Educ. J . 32:40. Available online at: https://www.learntechlib.org/p/77200

Chen, J., and Chang, C. (2006). Using computers in early childhood classrooms: teachers' attitudes, skills and practices. J. Early Childhood Res. 4, 169–188. doi: 10.1177/1476718X06063535

Clements, D. H., and Sarama, J. (2003). Young children and technology: what does the research say? Young Child. 58, 34–40. Available online at: https://www.jstor.org/stable/42729004

Darling-Hammond, L. (2004). Standards, accountability, and school reform. Teach. Coll. Rec. 106, 1047–1085. doi: 10.1111/j.1467-9620.2004.00372.x

Davey, T., Godwin, J., and Mittelholtz, D. (1997). Developing and scoring an innovative computerized writing assessment. J. Educ. Meas. 34, 21–41. doi: 10.1111/j.1745-3984.1997.tb00505.x

Deluca, C., Luu, K., Sun, Y., and Klinger, D. A. (2012). Assessment for learning in the classroom: barriers to implementation and possibilities for teacher professional learning. Assess. Matters 4, 5–29. Available online at: https://www.nzcer.org.nz/nzcerpress/assessment-matters/articles/assessment-learning-classroom-barriers-implementation-and

Deno, S. L. (1985). Curriculum-based measurement: the emerging alternative. Except. Child. 52, 219–232. doi: 10.1177/001440298505200303

Deno, S. L. (1997). “Whether thou goest…Perspectives on progress monitoring,” in Issues in Educating Students With Disabilities , eds J. W. Lloyd, E. J. Kameenui, and D. Chard (Mahwah, NJ: Lawrence Erlbaum Associates, 77–99.

Dolan, R. P., Burling, K. S., Rose, D., Beck, R., Murray, E., Strangman, N., et al. (2010). Universal Design for Computer-Based Testing (UD-CBT) Guidelines. Iowa City, IA: Pearson.

Donker, A., and Reitsma, P. (2007). Drag-and-drop errors in young children's use of the mouse. Interact. Comput. 19, 257–266. doi: 10.1016/j.intcom.2006.05.008

Dwyer, C. A. (1998). Assessment and classroom learning theory and practice. Assess. Educ. 5, 131–137. doi: 10.1080/0969595980050109

Ellis, K., and Blashki, K. (2004). Toddler techies: a study of young children's interaction with computers. Inform. Technol. Childhood Educ. Annu. 1, 77–96.

Every Student Succeeds Act (2015). Every Student Succeeds Act of 2015 § 6301, 20, USC § 1177. Retrieved from: https://nche.ed.gov/every-student-succeeds-act/

Falk, T. H., Tam, C., Schellnus, H., and Chau, T. (2011). On the development of a computer-based handwriting assessment tool to objectively quantify handwriting proficiency in children. Comput. Methods Prog. Biomed. 104, 102–111. doi: 10.1016/j.cmpb.2010.12.010

Foorman, B. R., York, M., Santi, K. L., and Francis, D. (2008). Contextual effects on predicting risk for reading difficulties in first and second grade. Read. Writing 21, 371–394. doi: 10.1007/s11145-007-9079-5

Fuchs, L. (2004). The past, present, and future of curriculum-based measurement research. Schl. Psychol. Rev. 33, 188–192. Available online at: https://psycnet.apa.org/record/2004-16860-002

Fuchs, L., Deno, S. L., and Mirkin, P. K. (1984). The effects of frequent curriculum-based measurement and evaluation of pedagogy, student achievement, and student awareness of learning. Am. Educ. Res. J. 21, 449–460. doi: 10.3102/00028312021002449

Fuchs, L. S., Fuchs, D., and Hamlett, C. L. (1989). Effects of alternative goal structures within curriculum-based measurement. Except. Child. 55, 429–438. doi: 10.1177/001440298905500506

Gersten, R., Beckmann, S., Clarke, B., Foegen, A., Marsh, L., Star, J. R., et al. (2009). Assisting Students Struggling with Mathematics: Response to Intervention (RtI) for Elementary and Middle Schools. IES Practice Guide. NCEE 2009–4060. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from: https://ies.ed.gov/ncee/wwc/Docs/PracticeGuide/rti_math_pg_042109.pdf

Gersten, R., Compton, D., Connor, C. M., Dimino, J., Santoro, L., Linan-Thompson, S., et al. (2008). Assisting Students Struggling with Reading: Response to Intervention (RtI) and Multi-Tier Intervention in the Primary Grades. IES Practice Guide. NCEE 2009–4045 . Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from: https://ies.ed.gov/ncee/wwc/docs/practiceguide/rti_reading_pg_021809.pdf

Gonski, D. (2018). Through Growth to Achievement: Report of the Review to Achieve Educational Excellence in Australian Schools . Commonwealth of Australia. Retrieved from: https://docs.education.gov.au/system/files/doc/other/662684_tgta_accessible_final_0.pdf

Graney, S. B., and Shinn, M. R. (2005). Effects of reading curriculum-based measurement (R-CBM) teacher feedback in general education classroom. Schl. Psychol. Rev. 34, 184–201. Available online at: https://psycnet.apa.org/record/2005-06945-003

Harvard Business Review Analytical Services (2017). Education in Australia: Transforming for the 2020 economy . Retrieved from: https://blogs.msdn.microsoft.com/education/2017/10/30/education-in-australia-transforming-for-the-2020-economy/

Hayward, L., and Hedge, N. (2005). Travelling towards change in assessment: policy, practice and research in education. Assess. Educ. 12, 55–76. doi: 10.1080/0969594042000333913

Heritage, M. (2007). Formative assessment: what do teachers need to know and do? Phi Delta Kappan 89, 140–145. doi: 10.1177/003172170708900210

Heritage, M. (2018). Assessment for learning as support for student self-regulation. Aust. Educ. Res. 45, 51–63. doi: 10.1007/s13384-018-0261-3

Hitchcock, C. H., and Noonan, M. J. (2000). Computer-assisted instruction of early academic skills. Top. Early Childh. Spcl. Educ. 20, 145–158. doi: 10.1177/027112140002000303

Hodges, D., Eames, C., and Coll, R. K. (2014). Theoretical perspectives on assessment in cooperative education placements. Asia Pac. J. Cooperat. Educ. 15, 189–207. Available online at: https://eric.ed.gov/?id=EJ1113725

Hourcade, J. P., Bederson, B. B., Druin, A., and Guimbretie‘re, F. (2004). Differences in pointing task performance between preschool children and adults using mice. ACM Trans. Comput. Hum. Interact. 11, 357–386. doi: 10.1145/1035575.1035577

ICT Literacy Panel (2001). Digital Transformation: A Framework for ICT Literacy. Princeton, NJ: Educational Testing Service.

Jodoin, M. G. (2003). Measurement efficiency of innovative item formats in computer-based testing. J. Educ. Meas. 40, 1–15. doi: 10.1111/j.1745-3984.2003.tb01093.x

Joiner, R., Messer, D., Light, P., and Littleton, K. (1998). It is best to point for young children: a comparison of children's pointing and dragging. Comput. Hum. Behav. 14, 513–529. doi: 10.1016/S0747-5632(98)00021-1

Klein, P. S., Nir-Gal, O., and Darom, E. (2000). The use of computers in kindergarten, with or without adult mediation; effects on children's cognitive performance and behavior. Comput. Hum. Behav. 16, 591–608. doi: 10.1016/S0747-5632(00)00027-3

Koomen, M., and Zoanetti, N. (2018). Strategic planning tools for large-scale technology-based assessments. Assess. Educ. 25, 200–223. doi: 10.1080/0969594X.2016.1173013

Kopriva, R., and Bauman, J. (2008). “Testing for the future: addressing the needs of low English proficient students through using dynamic formats and expanded item types,” Paper presented at the American Educational Research Association (New York, NY).

Labbo, D. L., and Reinking, D. (2003). “Computers and early literacy education,” in Handbook of Early Childhood Literacy , eds N. Hall, J. Larson, and J. Marsh (London: SAGE), 338–354. doi: 10.4135/9781848608207.n28

Lachapelle, C. P., Cunningham, C. M., and Oh, Y. (2018). What is technology? Development and evaluation of a simple instrument for measuring children's conceptions of technology. Int. J. Sci. Educ. 41, 188–209. doi: 10.1080/09500693.2018.1545101

Landry, S. H., Anthony, J. L., Assel, M. A., Carlo, M., Johnson, U., Montroy, J., et al. (2017). Texas Kindergarten Entry Assessment Technical Manual . Houston, TX: University of Texas Health Science Center.

Lankshear, C., and Knobel, M. (2003). New technologies in early childhood literacy research: a review of research. J. Early Childh. Literacy 3, 59–82. doi: 10.1177/14687984030031003

Lawless, K. A., and Pellegrino, J. W. (2007). Professional development in integrating technology into teaching and learning: knowns, unknowns, and ways to pursue better questions and answers. Rev. Educ. Res. 77, 575–614. doi: 10.3102/0034654307309921

Looney, A., Cumming, J., Van der Kleij, F. M., and Harris, K. (2018). Reconceptualising the role of teachers as assessors: teacher assessment identity. Assess. Educ. 25, 1–26. doi: 10.1080/0969594X.2016.1268090

Lopez, L. M., and Pasquini, R. (2017). Professional controversies between teachers about their summative assessment practices: a tool for building assessment capacity. Assess. Educ. 24, 228–249. doi: 10.1080/0969594X.2017.1293001

Lu, Y., Ottenbreit-Leftwich, A. T., Ding, A., and Glazewski, K. (2017). Experienced iPad-using early childhood teachers: practices in the one-to-one iPad classroom. Comput. Schl. 34, 1–15. doi: 10.1080/07380569.2017.1287543

Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Appl. Psychol. Meas. 20, 389–404. doi: 10.1177/014662169602000406

Martin, W., Strother, S., Beglau, M., Bates, L., Reitzes, T., and McMillan Culp, K. (2010). Connecting instructional technology professional development to teacher and student outcomes. J. Res. Technol. Educ. 43, 53–74. doi: 10.1080/15391523.2010.10782561

Matthews, J., and Seow, P. (2007). Electronic paint: understanding children's representation through their interactions with digital paint. Int. J. Art Design Educ. 26, 251–263. doi: 10.1111/j.1476-8070.2007.00536.x

Ministerial Council on Education Employment Training, and Youth Affairs (MCEETYA). (2008). Melbourne Declaration on Educational Goals for Young Australians . Melbourne, VIC: MCEETYA.

Mislevy, R. J., Steinberg, L. S., and Almond, R. G. (2003). Focus article: on the structure of educational assessments. Meas. Interdiscipl. Res. Perspect. 1, 3–6. doi: 10.1207/S15366359MEA0101_02

National Assessment Governing Board (2014). Science framework for the 2015 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board, U.S. Department of Education.

National Assessment Governing Board (2017). Reading framework for the 2017 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board, U.S. Department of Education.

National Assessment of Educational Progress (NAEP) (2018). Frequently Asked Questions . Retrieved from: https://nces.ed.gov/nationsreportcard/about/faqs.aspx

National Assessment Program Literacy Numeracy (NAPLAN) (2018). When Will NAPLAN Online Start ? Retrieved from: https://www.nap.edu.au/online-assessment/FAQs

National Research Council (2010). State Assessment Systems: Exploring Best Practices and Innovations: Summary of Two Workshops . Washington, DC: The National Academies Press.

Neal, D. (2011). “The design of performance pay in education,” in Handbook of the Economics of Education , eds E. A. Hanushek, S. Machin, and L. Woessmann (San Diego, CA: North-Holland), 495–550. doi: 10.3386/w16710

Neumann, M. M. (2018). Using tablets and apps to enhance emergent literacy skills in young children. Early Childh. Res. Q. 42, 239–246. doi: 10.1016/j.ecresq.2017.10.006

Neumann, M. M., and Neumann, D. L. (2018). Validation of a touch screen tablet assessment of literacy skills and a comparison with a traditional paper-based assessment. Int. J. Res. Method Educ. 42, 385–398. doi: 10.1080/1743727X.2018.1498078

Neumann, M. M., Worrall, S., and Neumann, D. L. (2019). Validation of an expressive and receptive tablet assessment of early literacy. J. Res. Technol. Educ. 51, 1–16. doi: 10.1080/15391523.2019.1637800

Nickerson, R. A. (1989). New directions in educational assessment. Educ. Res. 18, 3–7. doi: 10.3102/0013189X018009003

Ojanen, E., Ronimus, M., Ahonen, T., Chansa-Kabali, T., February, P., Jere-Folotiya, J., et al. (2015). GraphoGame–A catalyst for multi-level promotion of literacy in diverse contexts. Front. Psychol. 6:671. doi: 10.3389/fpsyg.2015.00671

Olson, A. (2002). Technology solutions for testing. Schl. Admin. 59, 20–23. Available online at: https://eric.ed.gov/?id=EJ642970

O'Neill, K., and Folk, V. (1996). “Innovative CBT item formats in a teacher licensing program,” in Annual Meeting of the National Council on Measurement in Education (New York, NY).

Parshall, C. G., Becker, K. A., and Pearson, V. U. E. (2008). “Beyond the technology: developing innovative items,” in Bi-Annual Meeting of the International Test Commission (Manchester).

Parshall, C. G., Davey, T., and Pashley, P. J. (2000). “Innovative item type for computerized testing,” in Computerized Adaptive Testing: Theory and Practice , eds W. J. van der Linden and C. A. W. Glas (Norwell, MA: Kluwer), 129–148. doi: 10.1007/0-306-47531-6_7

Parshall, C. G., Stewart, R., and Ritter, J. (1996). “Innovations: sound, graphics, andParshall alternative response modes,” Paper Presented at the Annual Meeting of the National Council on Measurement in Education (New York, NY).

Pearson, G., and Young, A. T. (2002). Technically Speaking: Why all Americans Need to Know More About Technology. Washington, DC: National Academy of Engineering.

Pellegrino, J. W., and Quellmalz, E. S. (2010). Perspectives on the integration of technology and assessment. J. Res. Technol. Educ. 43, 119–134. doi: 10.1080/15391523.2010.10782565

Penuel, W. R., Fishman, B. J., Yamaguchi, R., and Gallagher, L. P. (2007). What makes professional development effective? Strategies that foster curriculum implementation. Am. Educ. Res. J. 44, 921–958. doi: 10.3102/0002831207308221

Pew Research Center (2015). The Smartphone Difference . Retrieved from: http://www.pewinternet.org/2015/04/01/us-smartphone-use-in-2015/

Pommerich, M. (2004). Developing computerized versions of paper-and-pencil tests: Mode effects for passage-based tests. J. Technol. Learn. Assess. 2, 1–43. Available online at: https://ejournals.bc.edu/index.php/jtla/article/view/1666

Pommerich, M., and Burden, T. (2000). “From simulation to application: examinees react to computerized testing,” Paper Presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA).

Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theor. Pract. 48, 4–11. doi: 10.1080/00405840802577536

Productivity Commission (2016). National Education Evidence Base, Inquiry Report No. 80. Canberra, ACT . Retrieved from: https://www.pc.gov.au/inquiries/completed/education-evidence/report/education-evidence-overview.pdf

Randel, B., and Clark, T. (2013). “Measuring classroom assessment practices,” in SAGE Handbook of Research on Classroom Assessment , ed J. H. McMillan (Thousand Oaks, CA: SAGE), 145–163. doi: 10.4135/9781452218649.n9

Rankin, J. G. (2016). Standards for Reporting Data to Educators: What Educational Leaders Should Know and Demand . New York, NY: Routledge.

Renaissance Learning (2015). STAR Reading: Technical Manual. Wisconsin Rapids, WI: Renaissance Learning.

Russell, M. (2010). “Technology-aided formative assessment of learning,” in Handbook of Formative Assessment , eds H. L. Andrade and G. J. Cizek (New York, NY: Routledge, 125–138.

Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruct. Syst. 18, 119–144. doi: 10.1007/BF00117714

Salend, S. J. (2009). Classroom Testing and Assessment for all Students: Beyond Standardization . Thousand Oaks, CA: Corwin Press. doi: 10.4135/9781483350554

Scalise, K., and Gifford, B. (2006). Computer-based assessment in e-learning: a framework for constructing “intermediate constraint” questions and tasks for technology platforms. J. Technol. Learn. Assess. 4, 1–43. Available online at: https://ejournals.bc.edu/index.php/jtla/article/view/1653

Schmid, R. F., Miodrag, N., and Francesco, N. D. (2008). A human-computer partnership: the tutor/child/computer triangle promoting the acquisition of early literacy skills. J. Res. Technol. Educ. 41, 63–84. doi: 10.1080/15391523.2008.10782523

Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika 61, 331–354. doi: 10.1007/BF02294343

Shapiro, E., Angello, L., and Eckert, T. (2004). Has curriculum-based assessment become a staple of school psychology practice? An update and extension of knowledge, use, and attitudes from 1990 to 2000. Schl. Psychol. Rev. 33, 249–257. Available online at: https://psycnet.apa.org/record/2004-16860-007

Silseth, K., and Gilje, O. (2019). Multimodal composition and assessment: a sociocultural perspective. Assess. Educ. 26, 1–17. doi: 10.1080/0969594X.2017.1297292

Smarter Balanced Assessment Consortium (2018). Smarter Balanced Assessment Consortium: 2016-2017 Technical Report . Oakland, CA: Regents of the University of California.

State Education Technology Directors Association (2003). SETDA National Leadership Institute Toolkit. Burnie, MD: SETDA.

Stecher, B. M., and Klein, S. P. (1997). The cost of science performance assessments in large-scale testing programs. Educ. Eval. Policy Anal. 19, 1–14. doi: 10.3102/01623737019001001

Stiggins, R. (2010). “Essential formative assessment competencies for teachers and school leaders,” in Handbook of Formative Assessment , eds H. L. Andrade and G. J. Cizek (New York, NY: Routledge, 233–250.

Stiggins, R. J. (2002). Assessment crisis: the absence of assessment for learning. Phi Delta Kappan 83, 758–765. doi: 10.1177/003172170208301010

Twomey, D. M., Wrigley, C., Ahearne, C., Murphy, R., De Haan, M., Marlow, N., et al. (2018). Feasibility of using touch screen technology for early cognitive assessment in children. Arch. Dis. Childh. 103, 853–858. doi: 10.1136/archdischild-2017-314010

Van der Kleij, F. M., Cumming, J. J., and Looney, A. (2018). Policy expectations and support for teacher formative assessment in Australian education reform. Assess. Educ. 26, 620–637. doi: 10.1080/0969594X.2017.1374924

VanDerHeyden, A. M. (2005). Intervention-driven assessment practices in early childhood/early intervention: measuring what is possible rather than what is present. J. Early Interv. 28, 28–33. doi: 10.1177/105381510502800104

VanDerHeyden, A. M., Witt, J. G., and Gilbertson, D. (2007). A multi-year evaluation of the effects of a response to intervention (RTI) model on identification of children for special education. J. Schl. Psychol. 45, 225–256. doi: 10.1016/j.jsp.2006.11.004

Verhallen, M., Bus, A., and De Jong, M. (2006) The promise of multimedia stories for kindergarten children at risk. J. Educ. Psychol. 98, 410–419. doi: 10.1037/0022-0663.98.2.410

Wainer, H. (1990). Computerized Adaptive Testing: A Primer . Hillsdale, NJ: Lawrence Erlbaum.

Wainer, H., and Kiely, G. L. (1987). Item clusters and computerized adaptive testing: a case for testlets. J. Educ. Meas. 24, 185–201. doi: 10.1111/j.1745-3984.1987.tb00274.x

Wainer, H., and Lewis, C. (1990). Toward a psychometrics for testlets. J. Educ. Meas. 27, 1–14. doi: 10.1111/j.1745-3984.1990.tb00730.x

Wiliam, D. (2010). “An integrative summary of the research literature and implications for a new theory of formative assessment,” in Handbook of Formative Assessment , eds H. L. Andrade and G. J. Cizek (New York, NY: Routledge, 18–40.

Wiliam, D. (2011). What is assessment for learning? Stud. Educ. Eval. 37, 3–14. doi: 10.1016/j.stueduc.2011.03.001

Wise, S. L. (1996). “A critical analysis of the arguments for and against item review in computerized adaptive testing,” Paper Presented at the Annual Meeting of the National Council on Measurement in Education (New York, NY).

Wise, S. L. (1997). “Examinee issues in CAT,” Paper Presented at the Annual Meeting of the National Council on Measurement in Education (Chicago, IL).

Woloshyn, V. W., Bajovic, M., and Worden, M. M. (2017). Promoting student-centred learning using iPads in a grade 1 classroom: using the digital didactic framework to deconstruct instruction. Comput. Schl. 34, 152–167. doi: 10.1080/07380569.2017.1346456

Keywords: assessment, technology, teacher, student, education policy, early childhood education

Citation: Neumann MM, Anthony JL, Erazo NA and Neumann DL (2019) Assessment and Technology: Mapping Future Directions in the Early Childhood Classroom. Front. Educ. 4:116. doi: 10.3389/feduc.2019.00116

Received: 27 April 2019; Accepted: 04 October 2019; Published: 18 October 2019.

Reviewed by:

Copyright © 2019 Neumann, Anthony, Erazo and Neumann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Michelle M. Neumann, m.neumann@griffith.edu.au ; Jason L. Anthony, jasonanthony@usf.edu

This article is part of the Research Topic

Advances in Classroom Assessment Theory and Practice

Evaluation of an Educational Scholarship Fellowship Program for Health Professions Educators

  • Original Research
  • Published: 02 May 2024

Cite this article

scholarly articles educational assessment

  • Amber J. Heck   ORCID: orcid.org/0000-0002-0758-2950 1 ,
  • Sateesh Arja   ORCID: orcid.org/0000-0001-8692-3348 2 ,
  • Laura D. Bauler   ORCID: orcid.org/0000-0001-6508-1938 3 ,
  • Khalil Eldeeb   ORCID: orcid.org/0000-0001-6071-6412 4 ,
  • Kathryn N. Huggett   ORCID: orcid.org/0000-0002-3061-3006 5 ,
  • Alana D. Newell   ORCID: orcid.org/0000-0001-9251-6159 6 ,
  • Kelly M. Quesnelle   ORCID: orcid.org/0000-0002-2408-1904 7 ,
  • Amina Sadik   ORCID: orcid.org/0000-0003-3238-4590 8 ,
  • Norma Saks   ORCID: orcid.org/0000-0003-3228-205X 9 ,
  • Paula J. W. Smith   ORCID: orcid.org/0000-0003-3878-427X 10 &
  • Jonathan J. Wisco   ORCID: orcid.org/0000-0003-3689-5937 11  

Introduction

Historically, the requirement to produce scholarship for advancement has challenged health professions educators heavily engaged in teaching. As biomedical scientists or healthcare practitioners, few are trained in educational scholarship, and related faculty development varies in scope and quality across institutions. Currently, there is a need for faculty development and mentoring programs to support the development of these skills.

The International Association of Medical Science Educators (IAMSE) established the Medical Educator Fellowship (MEF) Program to foster health professions educational scholarship. MEF addresses the following: curriculum design, teaching methods and strategies, assessment, educational scholarship, and leadership. Participants receive mentorship and faculty development, and complete an educational scholarship project. Using a logic model, we conducted a retrospective program evaluation with data from Program records, database searches, graduate surveys, and focus groups.

Over 14 years, MEF graduated 61 participants with diverse terminal degrees from five continents and six academic program areas. Graduate survey responses indicated enhanced post-Program skills in all focus areas, that the majority would recommend MEF to a colleague, and that mentorship, networking, and professional development were strengths. Focus group outcomes indicated professional growth, increased confidence, and increased sense of community.

MEF addresses health professions educators’ need for faculty development and mentorship in educational scholarship. Evaluation outcomes suggest that MEF effectively enhanced perceived skills across focus areas. Similar programs are essential to support faculty who dedicate significant time to teaching. Organizations like IAMSE can demonstrate the value of educational scholarship and positively impact health professions educator careers by supporting such programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

scholarly articles educational assessment

Data Availability

All data generated or analyzed during this study are available from the corresponding author upon request.

Sethi A, Ajjawi R, McAleer S, Schofield S. Exploring the tensions of being and becoming a medical educator. BMC Med Educ. 2017. https://doi.org/10.1186/s12909-017-0894-3 .

Article   Google Scholar  

Creel A, Paul C, Bockrath R, Jirasevijinda T, Pineda J, Tenney-Soeiro R, Khidir A, Jackson J, Peltier C, Trainor J, Keeley M. Promotion criteria for medical educators: are we climbing a ladder with invisible rungs? Acad Pediatr. 2024. https://doi.org/10.1016/j.acap.2024.01.002 .

Dickinson BL, Deming N, Coplit L, Huggett KN, Quesnelle K, Sheakley M, Rosenfeld G, Wragg S. IAMSE member perspectives on the recognition, reward, and promotion of medical science educators: an IAMSE sponsored survey. Med Sci Educ. 2018. https://doi.org/10.1007/s40670-018-0548-z .

Irby DM, O’Sullivan PS. Developing and rewarding teachers as educators and scholars: remarkable progress and daunting challenges. Med Edu. 2018. https://doi.org/10.1111/medu.13379 .

Fleming VM, Schindler N, Martin GJ, DaRosa DA. Separate and equitable promotion tracks for clinician-educators. JAMA. 2005. https://doi.org/10.1001/jama.294.9.1101 .

Neumann R. Research and scholarship: perceptions of senior academic administrators. High Educ. 1993. https://doi.org/10.1007/BF01384743 .

Boyer E L. Scholarship reconsidered: priorities of the professoriate. Princeton University Press; 1990.

Simpson D, Fincher RM, Hafler JP, Irby DM, Richards BF, Rosenfeld GC, Viggiano TR. Advancing educators and education by defining the components and evidence associated with educational scholarship. Med educ. 2007. https://doi.org/10.1111/j.1365-2923.2007.02844.x .

Academy of Medical Educators. Professional standards. 4th ed. Cardiff: Academy of Medical Educators; 2021.

Huwendiek S, Mennin S, Dern P, Ben-David MF, Van Der Vleuten C, Tonshoff B, Nikendei C. Expertise, needs and challenges of medical educators: results of an international web survey. Med Teach. 2010. https://doi.org/10.3109/0142159X.2010.497822 .

Zipkin DA, Ramani S, Stankiewicz CA, Lo MC, Chisty A, Alexandraki I, Wamsley M, Rothenberger SD, Jeong K, Spagnoletti CL. Clinician-educator training and its impact on career success: a mixed methods study. J Gen Intern Med. 2020. https://doi.org/10.1007/s11606-020-06049-w .

Sadik A, Heck A, Arja S, Huggett K, Quesnelle K, Smith PJ. Ten years of global faculty development in educational scholarship through the IAMSE Medical Educator Fellowship: Submitted on behalf of the IAMSE Educational Scholarship Committee. In AMEE: Association for Medical Education Europe 2021 Conference. 2021.

Van Melle E. Using a logic model to assist in the planning, implementation, and evaluation of educational programs. Acad Med. 2016. https://doi.org/10.1097/acm.0000000000001282 .

Frye AW, Hemmer PA. Program evaluation models and related theories: AMEE guide no. 67. Med Teach 2012.

WK Kellogg Foundation. Logic model development guide: using logic models to bring together planning, evaluation and action. Battle Creek, Michigan: WK Kellogg Foundation; 2004.

Armstrong EG, Barsion SJ. Using an outcomes-logic-model approach to evaluate a faculty development program for medical educators. Acad Med. 2018. https://doi.org/10.1097/01.ACM.0000222259.62890.71 .

Rajashekara S, Naik AD, Campbell CM, Gregory ME, Rosen T, Engebretson A, Godwin KM. Using a logic model to design and evaluate a quality improvement leadership course. Acad Med. 2020. https://doi.org/10.1097/ACM.0000000000003191 .

Kiger ME, Varpio L. Thematic analysis of qualitative data: AMEE Guide No. 131. Med Teach. 2020. https://doi.org/10.1080/0142159X.2020.1755030 .

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006. https://doi.org/10.1191/1478088706qp063oa .

Steinert Y, Mann K, Anderson B, Barnett BM, Centeno A, Naismith L, Prideaux D, Spencer J, Tullo E, Viggiano T, Ward H. A systematic review of faculty development initiatives designed to enhance teaching effectiveness: a 10-year update: BEME Guide No. 40. Med Teach. 2016. https://doi.org/10.1080/0142159X.2016.1181851 .

Alexandraki I, Rosasco RE, Mooradian AD. An evaluation of faculty development programs for clinician–educators: a scoping review. Acad Med. 2021. https://doi.org/10.1097/ACM.0000000000003813 .

Hutchings P, Shulman LS. The scholarship of teaching: new elaborations, new developments. Change The magazine of Higher learning. 1999. https://doi.org/10.1080/00091389909604218 .

Dickinson BL. Advancing the field: the scholarship of medical education. In: Hugget KN, Quesnelle KM, Jefferies WB, editors. An introduction to medical teaching: the foundations of curriculum design, delivery, and assessment. Cham Springer International Publishing. 2022;301–18. https://doi.org/10.1007/978-3-030-85524-6_20 .

Dewey CM, Turner TL, Perkowski L, Bailey J, Gruppen LD, Riddle J, Singhal G, Mullan P, Poznanski A, Pillow T, Robins LS. Twelve tips for developing, implementing, and sustaining medical education fellowship programs: building on new trends and solid foundations. Med Teach. 2016. https://doi.org/10.3109/0142159X.2015.1056518 .

Kern B, Mettetal G, Dixson M, Morgan RK. The role of SoTL in the academy: upon the 25th anniversary of Boyer’s Scholarship Reconsidered. J Scholarsh Teach Learn. 2015. https://doi.org/10.14434/josotl.v15i3.13623 .

Marcketti SB, Freeman S. SoTL evidence on promotion and tenure vitas at a research university. J Scholarsh Teach Learn. 2016. https://doi.org/10.14434/josotl.v16i5.21152 .

Jordan J, Yarris LM, Santen SA, Guth TA, Rougas S, Runde DP, Coates WC. Creating a cadre of fellowship-trained medical educators, part II: a formal needs assessment to structure postgraduate fellowships in medical education scholarship and leadership. Acad Med. 2017. https://doi.org/10.1097/ACM.0000000000001460 .

Gottlieb M, Chan TM, Clarke SO, Ilgen JS, Jordan J, Moschella P, Santen SA, Yarris LM, Coates WC. Emergency medicine education research since the 2012 consensus conference: how far have we come and what’s next? AEM Educ Train. 2020. https://doi.org/10.1002/aet2.10404 .

Bucklin BA, Valley M, Welch C, Tran ZV, Lowenstein SR. Predictors of early faculty attrition at one Academic Medical Center. BMC Med Educ. 2014. https://doi.org/10.1186/1472-6920-14-27 .

Nausheen F, Agarwal MM, Estrada JJ, Atapattu DN. A survey of retaining faculty at a new medical school: opportunities, challenges and solutions. BMC Med Educ. 2018. https://doi.org/10.1186/s12909-018-1330-z .

Arja SB, Acharya Y, Fatteh S, Fateh A, Arja SB. Policies and practices that promote faculty retention at AUSOM-a pilot study. MedEdPublish. 2018;7(268):268.

McAleese M, Blandh A, Berger V, Bode C, Muehlfeit J, Petrin T, Tsoukalis L. Report to the European commission on improving the quality of teaching and learning in Europe’s higher education institutions. Luxembourg: Publication Office of the European Union; 2013. http://hdl.voced.edu.au/10707/293965 .

Lown BA, Newman LR, Hatem CJ. The personal and professional impact of a fellowship in medical education. Acad Med. 2009. https://doi.org/10.1097/ACM.0b013e3181ad1635 .

Buckley H, Nimmon L. Learning in faculty development: the role of social networks. Acad Med. 2020. https://doi.org/10.1097/ACM.0000000000003627 .

Jippes E, Steinert Y, Pols J, Achterkamp MC, van Engelen JM, Brand PL. How do social networks and faculty development courses affect clinical supervisors’ adoption of a medical education innovation? An exploratory study Acad Med. 2013. https://doi.org/10.1097/ACM.0b013e318280d9db .

Bandura A. Social learning theory. General Learning Press; 1971.

Download references

Author information

Authors and affiliations.

University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, TX, 76107, USA

Amber J. Heck

Avalon University School of Medicine, Willemstad, Curaçao

Sateesh Arja

Western Michigan University Homer Stryker MD School of Medicine, Kalamazoo, MI, USA

Laura D. Bauler

Campbell University Jerry M. Wallace School of Osteopathic Medicine, Lillington, NC, USA

Khalil Eldeeb

Larner College of Medicine at the University of Vermont, Burlington, VT, USA

Kathryn N. Huggett

Baylor College of Medicine, Houston, TX, USA

Alana D. Newell

University of South Carolina School of Medicine Greenville, Greenville, SC, USA

Kelly M. Quesnelle

Touro University Nevada, College of Osteopathic Medicine, Henderson, NV, USA

Amina Sadik

Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA

Edinburgh Medical School at the University of Edinburgh, Edinburgh, Scotland

Paula J. W. Smith

Boston University Aram V. Chobanian & Edward Avedisian School of Medicine, Boston, MA, USA

Jonathan J. Wisco

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Amber J. Heck .

Ethics declarations

Competing interest.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Heck, A.J., Arja, S., Bauler, L.D. et al. Evaluation of an Educational Scholarship Fellowship Program for Health Professions Educators. Med.Sci.Educ. (2024). https://doi.org/10.1007/s40670-024-02036-6

Download citation

Accepted : 01 April 2024

Published : 02 May 2024

DOI : https://doi.org/10.1007/s40670-024-02036-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Faculty development
  • Program evaluation
  • Educational scholarship
  • Fellowship program
  • Professional development
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 30 April 2024

A substance use disorder training curriculum for internal medicine residents using resident-empaneled patients

  • Mim Ari 1 &
  • Julie L. Oyler 1  

BMC Medical Education volume  24 , Article number:  478 ( 2024 ) Cite this article

3 Altmetric

Metrics details

Internal Medicine (IM) residents frequently encounter, but feel unprepared to diagnose and treat, patients with substance use disorders (SUD). This is compounded by negative regard for patients with SUD. Optimal education strategies are needed to empower IM residents to care for patients with SUD. The objective of this study was to evaluate a brief SUD curriculum for IM residents, using resident-empaneled patients as an engaging educational strategy.

Following a needs assessment, a 2-part SUD curriculum was developed for IM residents at the University of Chicago during the 2018–2019 academic year as part of the ambulatory curriculum. During sessions on Opioid Use Disorder (OUD) and Alcohol Use Disorder (AUD), a facilitator covered concepts about screening, diagnosis, and treatment. In session, residents completed structured worksheets applying concepts to one of their primary care patients. A post-session assessment included questions on knowledge, preparedness & attitudes.

Resident needs assessment ( n  = 44/105, 42% response rate) showed 86% characterized instruction received during residency in SUD as none or too little, and residents did not feel prepared to treat SUD. Following the AUD session, all residents ( n  = 22) felt prepared to diagnose and treat AUD. After the OUD session, all residents ( n  = 19) felt prepared to diagnose, and 79% ( n  = 15) felt prepared to treat OUD. Residents planned to screen for SUD more or differently, initiate harm reduction strategies and increase consideration of pharmacotherapy.

Conclusions

A brief curricular intervention for AUD and OUD using resident-empaneled patients can empower residents to integrate SUD diagnosis and management into practice.

Peer Review reports

Introduction

Given the high prevalence of substance use disorders (SUD), internal medicine (IM) residents frequently encounter patients with SUD [ 1 ]. Preparing a physician workforce to address SUD is crucial to improve care, a concept reinforced by the new 8-hour training requirement on management of opioid or other SUD for all Drug Enforcement Agency (DEA)-registered practitioners [ 2 ]. While psychiatry has historically been the “home” for addiction medicine, improving knowledge and confidence to treat SUD within Internal Medicine (IM) is necessary. Furthermore, exposure to SUD training during IM residency may help facilitate interest in advanced training, including Addiction Medicine fellowship, in this field. Accordingly, in 2022, the Accreditation Council of Graduate Medical Education (ACGME) added a requirement for Internal Medicine (IM) residencies to include clinical and educational experiences in addiction medicine (IV.C.2, IV.C.3), signaling this topic’s relevance for all internists [ 3 , 4 ]. However, while IM residents frequently encounter patients with SUD, most feel unprepared to diagnose and treat SUD [ 5 ]. Compounding this lack of SUD knowledge and preparedness is stigma towards patients with SUD, with multiple studies showing lower “regard” for patients with addiction [ 6 , 7 , 8 ].

Wide variability exists in SUD content coverage in IM residencies with limited assessment of what, by what method, and how well this topic is being taught. For example, 72% of IM residencies reported opioid use disorder (OUD) didactics, but only 15% reported “very effective” teaching on this topic [ 9 ]. Optimal curricular strategies are needed to empower residents to integrate SUD diagnosis and management into their practices [ 10 ]. Previously published SUD curricula data has demonstrated increases in confidence, preparedness to diagnose and treat SUD, and responsibility to manage SUD [ 11 , 12 , 13 ], but most have required significant curricular time ranging from 6 to 16 sessions.

The objective of this study was to evaluate a brief SUD curriculum for IM residents, using resident-empaneled patients as an innovative and adaptable educational strategy. The goal of the overall curriculum is to empower residents with the knowledge and skills to care for patients with substance use disorder, ultimately leading to more compassionate and evidence-based care for patients with SUD.

Settings and participants

In 2018, all IM residents ( n  = 105, PGY1/2/3) at the University of Chicago were invited to complete an SUD curriculum needs assessment (Online Appendix 1). Based on the responses, a 2-part SUD curriculum was then developed for second and third year IM residents and delivered for the first time in the 2018–2019 academic year as part of the IM resident ambulatory curriculum. Participation in the ambulatory curriculum is expected for all second and third year IM residents ( n  = 60), but completing the post-session assessments was voluntary. The ambulatory sessions are variably attended due to other clinical requirements and vacation schedules. Some residents attended both sessions, some attended only one and some attended neither.

Program description

Using Kern’s six step approach for curriculum development and Vygotsky’s conceptual framework of situated learning-guided participation, 2 1-hour sessions were developed on Opioid Use Disorder (OUD) and Alcohol Use Disorder (AUD) [ 14 , 15 ]. Session objectives are listed in the accompanying Table  1 and were similar in format and style. Residents were asked to choose a patient from their outpatient panel with the corresponding SUD. A patient case was provided if the resident did not choose one of their empaneled patients. During each session, the facilitator, an IM faculty member with experience in addiction medicine (author: MA), presented material covering key concepts about screening, diagnosis, and treatment of either OUD or AUD. This was interwoven with working individually through the sections of a structured worksheet related to the patient case (Online Appendix 2 , 3 ). On their individual electronic devices, residents could access the electronic health record during the session to review information about the patient’s history, past clinic visits, medications and urine toxicology results. The worksheet served to reinforce and apply presented concepts culminating in creating an action plan for the patient, which the resident was encouraged to apply to the patient’s care at their next visit. The sessions were each delivered three times, as residents are assigned to one of three firms, and each firm had both an OUD and AUD session. The OUD and AUD sessions were delivered approximately two months apart.

Evaluation methods

Needs assessment questions included demographic information, prior training in SUD, current exposure to patients with SUD, questions assessing knowledge, confidence and preparedness to diagnose and treat SUD, and attitudes towards patients with SUD. Many of the survey questions on knowledge, confidence and preparedness were replicated from Wakeman et al. 2013 [ 5 ]. The Medical Condition Regard Scale (MCRS) was used to assess attitudes towards patients with SUD [ 16 ].

A post-session assessment was designed for each session. It included questions duplicated from the needs assessment on knowledge, preparedness to diagnose and treat patients with OUD or AUD, and a single question from the MCRS (“Regarding patients with OUD/AUD, there is little I can do to help patients like this”), which best tied to the overall curriculum goal to “empower residents with the knowledge and skills to care for patients with substance use disorder”. Residents were also asked, using an open-ended response question, to identify one “take-away” based on each session (Online Appendix 4 , 5 ).

Descriptive statistics were used to analyze the needs assessment and post-session evaluations.

This study was deemed exempt by the University of Chicago IRB.

Resident needs assessment ( n  = 44/105, 42% response rate) showed 86% ( n  = 37/43) characterized instruction received during residency in SUD as “none” or “too little”. While residents frequently encountered patients with SUD, only 56% ( n  = 23/41) felt prepared to diagnose and 20% ( n  = 8/41) felt prepared to treat SUD. Residents estimated an average of 25% (SD 14%) of patients on inpatient services and 14% (SD 10%) of clinic patients met criteria for SUD. 49% found patients with SUD particularly difficult to work with ( n  = 20/41, agree/strongly agree), and only 17% ( n  = 7/41, agree/strongly agree) found working with patients with SUD satisfying. 34% ( n  = 14/41) felt “there is little I can do to help” patients with SUD (not sure but probably agree/agree/strongly agree). The needs assessment questions were not required. A few surveys were incomplete which accounts for the variability in denominators. Questions that were duplicated in both the needs assessment and either the OUD or AUD session are presented with results in Table  2 . The full needs assessment MCRS scores are also available (Online Appendix 6 ).

All residents who attended a session completed the post-session assessments. After the OUD session ( n  = 20, 100% response rate, few incomplete assessment items), all residents ( n  = 19/19) felt prepared to diagnose, and 79% ( n  = 15/19) felt prepared to treat OUD. 16% of residents ( n  = 3/19) felt “there is little I can do to help” patients with OUD. Knowledge questions about OUD diagnostic criteria and buprenorphine mechanism of action were answered correctly 67% ( n  = 12/18) and 63% ( n  = 12/19) respectively. After the OUD session, the most frequently reported take-away points included plans to screen more for OUD ( n  = 10), initiate harm reduction strategies ( n  = 5), and discuss/learn to prescribe medications for OUD ( n  = 3).

Following the AUD session ( n  = 22, 100% response rate), all residents felt prepared to diagnose ( n  = 22/22) and treat ( n  = 22/22) AUD. No residents ( n  = 0/22) felt “there is little I can do to help” patients with AUD. All residents ( n  = 22/22) answered the knowledge question about AUD pharmacotherapy correctly. The most frequently reported take-away points included screening more/differently for AUD ( n  = 8), and considering medications for AUD ( n  = 18).

For the OUD session, 52% (10/19) of residents used their own patients for the worksheet case; for the AUD session, 68% (13/19) of residents used their own patient.

After a 2-part SUD curriculum, residents reported high levels of confidence and preparedness to diagnose and manage OUD and AUD. Residents generated take-away lessons to apply in clinical practice around SUD screening, harm reduction practices, and pharmacotherapy.

Several studies have shown that SUD curricula must effectively address knowledge gaps, and empower residents to integrate SUD recognition and management into their practices [ 5 , 11 , 12 , 13 , 17 ]. Strategies that specifically address stigma are key components given IM residents’ decreased regard for patients with SUDs, with some studies showing modest improvement in attitudes towards patients with SUDs following curricular interventions [ 8 , 18 ]. This study adds to previous literature by demonstrating the effect of an SUD curriculum of shorter duration that engages residents by using their own empaneled patients as case studies. The decrease in number of residents who agreed that “there is little I can do to help” patients with SUD highlights the impact of the curriculum on attitudes towards patients with SUD and belief in their ability to act.

Using interactive worksheets and resident empaneled patients are promising strategies to deliver SUD didactics; more than 50% of residents were able to identify an empaneled patient for the session. Future work could compare curricular impact of residents using empaneled patients vs. standardized cases. Additionally, tracking whether the created action plans for empaneled patients were implemented in clinic would give more insight into the impact of the curriculum. This education strategy could also be modified for use across disciplines (OBGYN, pediatrics, surgery, emergency medicine, neurology) and learner types (medical students, fellows, faculty), and could count towards the 8-hour training requirement for DEA licensure legislated by the Medication Access and Training Expansion (MATE) Act [ 2 ].

Residents had high levels of comfort with diagnosing and managing AUD compared to OUD. This may be due to higher baseline comfort with AUD due to higher prevalence of alcohol use. Takeaways in the AUD session were more focused on pharmacotherapy, while in the OUD session the focus was more on screening. This divide may indicate more intense focus is needed to build resident confidence in comprehensively caring for patients with OUD, starting with screening and moving towards management.

This study is limited by occurring at a single-site, and less than half of the second and third year classes participated in the sessions. Post-session surveys were not matched to the needs assessment so direct improvements in knowledge, confidence and preparedness could not be measured. The specific impacts of the worksheet and using resident-empaneled patients (both learner receptivity and added learning value) was not separately assessed, but could be evaluated in the future. The ability to identify empaneled patients, as well as faculty time and expertise, may limit generalizability.

Developing effective SUD content for IM trainees is paramount to improve care for patients with SUD, and increasingly being required by governing bodies (e.g. ACGME, DEA). While this curriculum could be replicated exactly, a site-specific needs assessment can guide educators to identify more specific didactic focus areas (e.g. diagnosis, treatment, attitudes, self-efficacy), while maintaining the use of empaneled patients as a curricular strategy. Adaptable curricula should be created and studied for residency programs with variable levels of curricular time, existing collective knowledge, clinical experiences, and faculty expertise [ 9 ].

Despite rising attention to SUD, there is wide variability in content coverage in IM residencies without clear ideal ways to deliver SUD content. This innovation adds a concise, adaptable, and replicable curriculum that empowers residents to integrate SUD diagnosis and management into clinical practice.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Key substance use and mental health indicators in the United States. Results from the 2015 National Survey on Drug Use and Health. Center for Behavioral Statistics and Quality. 2016 [September 2016:[ https://www.samhsa.gov/data/sites/default/files/NSDUH-FFR1-2015/NSDUH-FFR1-2015/NSDUH-FFR1-2015.pdf .

Text -. H.R.2067–117th Congress (2021–2022): MATE Act of 2021. https://www.congress.gov/bill/117th-congress/house-bill/2067/text .

Weimer MB, Tetrault JM, Fiellin DA. Patients with opioid use disorder deserve trained providers. Ann Intern Med. 2019;171(12):931–2.

Article   Google Scholar  

ACGME Program Requirements for Graduate Medical Education in Internal Medicine. ACGME-approved major revision: February 7, 2021; effective July 1, 2022. https://www.acgme.org/Portals/0/PFAssets/ProgramRequirements/140_Interna… Accessed on June 2, 2023.

Wakeman SE, Baggett MV, Pham-Kanter G, Campbell EG. Internal medicine residents’ training in substance use disorders: a survey of the quality of instruction and residents’ self-perceived preparedness to diagnose and treat addiction. Substance Abuse. 2013;34(4):363–70.

Gilchrist G, Moskalewicz J, Slezakova S, Okruhlica L, Torrens M, Vajd R, et al. Staff regard towards working with substance users: a European multi-centre study. Addiction. 2011;106(6):1114–25.

van Boekel LC, Brouwers EPM, van Weeghel J, Garretsen HFL. Healthcare professionals’ regard towards working with patients with substance use disorders: comparison of primary care, general psychiatry and specialist addiction services. Drug Alcohol Depend. 2014;134:92–8.

Meltzer EC, Suppes A, Burns S, Shuman A, Orfanos A, Sturiano CV, et al. Stigmatization of substance use disorders among internal medicine residents. Substance Abuse. 2013;34(4):356–62.

Windish DM, Catalanotti JS, Zaas A, Kisielewski M, Moriarty JP. Training in Safe Opioid Prescribing and Treatment of Opioid Use Disorder in Internal Medicine residencies: a National Survey of Program directors. J Gen Intern Med. 2022;37(11):2650–60.

Isaacson JH, Fleming M, Kraus M, Kahn R, Mundt M. A national survey of training in substance use disorders in residency programs. J Stud Alcohol. 2000;61(6):912–5.

Stein MR, Arnsten JH, Parish SJ, Kunins HV. Evaluation of a substance use disorder curriculum for internal medicine residents. Substance Abuse. 2011;32(4):220–4.

Raheemullah A, Andruska N, Saeed M, Kumar P. Improving Residency Education on Chronic Pain and Opioid Use Disorder: evaluation of CDC Guideline-Based Education. Subst Use Misuse. 2020;55(4):684–90.

Wakeman SE, Pham-Kanter G, Baggett MV, Campbell EG. Medicine Resident preparedness to diagnose and treat Substance Use disorders: impact of an enhanced curriculum. Substance Abuse. 2015;36(4):427–33.

Vygotskiĭ LS, Cole M. Mind in society: the development of higher psychological processes. No Title); 1978.

Thomas PA, Kern DE, Hughes MT, Tackett SA, Chen BY. Curriculum Development for Medical Education: a Six-Step Approach. Johns Hopkins University; 2022.

Christison GW, Haviland MG, Riggs ML. The medical condition regard scale: measuring reactions to diagnoses. Acad Medicine: J Association Am Med Colleges. 2002;77(3):257–62.

Matassa D, Perrella B, Feurdean MA. Novel team-based Learning Approach for an Internal Medicine Residency: medication-assisted treatments for Substance Use disorders. MedEdPORTAL: J Teach Learn Resour. 2021;17:11085.

Muzyk A, Mullan P, Andolsek KM, Derouin A, Smothers ZPW, Sanders C, et al. An Interprofessional Substance Use Disorder Course to Improve Students’ Educational outcomes and patients’ treatment decisions. Acad Medicine: J Association Am Med Colleges. 2019;94(11):1792–9.

Download references

Acknowledgements

This curriculum was developed as part of the Medical Education Research, Innovation, Teaching and Scholarship (MERITS) faculty development program at the University of Chicago. This data was presented at the AMERSA (The Association for Multidisciplinary Education and Research in Substance use and Addiction) national conference in 2019 as a poster. It was accepted for presentation at the 2020 AAIM and SGIM national conferences but not presented due to COVID-19.

Not applicable.

Author information

Authors and affiliations.

Department of Medicine, University of Chicago, 5841 S. Maryland Avenue, Chicago, IL, 60637, USA

Mim Ari & Julie L. Oyler

You can also search for this author in PubMed   Google Scholar

Contributions

MA was responsible for conception/design of the curricular innovation, data acquisition and analysis and drafted the original manuscript. JO contributed to conception/design of the curricular innovation and substantively revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mim Ari .

Ethics declarations

Ethical approval.

This study was deemed exempt by the University of Chicago Institutional Review Board (IRB21-0152). This determination was made under the Federal Regulations (45 CFR 46) category 0.104(d) (1) which is for “Research, conducted in established or commonly accepted educational settings, that specifically involves normal educational practices that are not likely to adversely impact students’ opportunity to learn required educational content or the assessment of educators who provide instruction. This includes most research on regular and special education instructional strategies, and research on the effectiveness of or the comparison among instructional techniques, curricula, or classroom management methods.” All experimental protocols were approved as above. Consent was not obtained from subjects given the above exemption.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ari, M., Oyler, J.L. A substance use disorder training curriculum for internal medicine residents using resident-empaneled patients. BMC Med Educ 24 , 478 (2024). https://doi.org/10.1186/s12909-024-05472-5

Download citation

Received : 15 January 2024

Accepted : 25 April 2024

Published : 30 April 2024

DOI : https://doi.org/10.1186/s12909-024-05472-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Substance use disorder
  • Opioid use disorder
  • Alcohol use disorder
  • Curriculum development
  • Curriculum evaluation

BMC Medical Education

ISSN: 1472-6920

scholarly articles educational assessment

Advertisement

Where Protesters on U.S. Campuses Have Been Arrested or Detained

By The New York Times

Police officers and university administrators have clashed with pro-Palestinian protesters on a growing number of college campuses in recent weeks, arresting students, removing encampments and threatening academic consequences. More than 2,000 people have been arrested or detained on campuses across the country.

Campus protests where arrests and detainments have taken place since April 18

The fresh wave of student activism against the war in Gaza was sparked by the arrests of at least 108 protesters at Columbia University on April 18, after administrators appeared before Congress and promised a crackdown. Since then, tensions between protesters, universities and the police have risen, prompting law enforcement to take action in some of America’s largest cities.

Arizona State University

Cal poly humboldt, case western reserve university, city college of new york, columbia university, dartmouth college, emerson college, emory university, florida state university, fordham university, indiana university - bloomington, new york university, north carolina state univesity, northeastern university, northern arizona university, ohio state university, portland state university, princeton university, stony brook university, tulane university, university at buffalo, university of arizona, university of california, los angeles, university of colorado, university of connecticut, university of florida, university of georgia, university of illinois urbana-champaign, university of mary washington, university of minnesota, university of new hampshire, university of new mexico, university of north carolina at chapel hill, university of south carolina, university of south florida, university of southern california, university of texas at austin, university of texas at dallas, university of utah, university of wisconsin-madison, virginia commonwealth university, virginia tech, washington university in st. louis.

  • Share full article

Our Coverage of the U.S. Campus Protests

News and Analysis

President Biden broke days of silence to finally speak out on the unrest disrupting campuses  across the United States, denouncing violence and antisemitism even as he defended the right to peaceful dissent.

At the University of California, Los Angeles, police officers dismantled a pro-Palestinian encampment  and made arrests after a tense hourslong standoff with demonstrators.

Police officers in riot gear arrested pro-Palestinian demonstrators at Fordham University’s Manhattan campus , the third university in New York City to face mass arrests.

Choosing Anonymity:  In an online world, doxxing and other consequences have led many student protesters to obscure their identities by wearing masks and scarves. That choice has been polarizing .

Seeing Links to a Global Struggle:  In many student protesters’ eyes, the war in Gaza is linked to other issues , such as policing, mistreatment of Indigenous people, racism and climate change.

Ending the Unrest:  Across the nation, universities are looking for ways to quell the protests . Columbia has taken the spotlight after calling in the police twice , while Brown chose a different path .

A 63-Year-Old Career Activist:  Videos show Lisa Fithian, whom the police called a “professional agitator,” working alongside protesters at Columbia  who stormed Hamilton Hall.

IMAGES

  1. How to read a scholarly article

    scholarly articles educational assessment

  2. Academic Article Summary

    scholarly articles educational assessment

  3. (PDF) Issue-Based Clustering of Scholarly Articles

    scholarly articles educational assessment

  4. How to Publish Your Article in a Peer-Reviewed Journal: Survival Guide

    scholarly articles educational assessment

  5. American Academic & Scholarly Research Journal Template

    scholarly articles educational assessment

  6. (PDF) PEDAGOGY IN HIGHER EDUCATION

    scholarly articles educational assessment

VIDEO

  1. Finding Scholarly Articles in OneSearch

  2. 7 Types Of Assessment/Evaluation in Teaching Learning || Teaching Aptitude ||

  3. "Embracing Academic Rigor

  4. Scholarly vs. Popular Articles

  5. Looking for Research Articles?

  6. Types of Assessment in Education

COMMENTS

  1. Assessment and the Future of Teacher Education

    Scholars also have asserted that the goal of assessment should be to address issues that challenge the nation such as global competition, college and career readiness, and academic achievement gaps (Darling-Hammond, 2015; Wagner, 2014).We also must find ways to use data in powerful ways to support the development of excellent, equity-driven teachers who advance student learning and are ...

  2. The past, present and future of educational assessment: A

    To see the horizon of educational assessment, a history of how assessment has been used and analysed from the earliest records, through the 20th century, and into contemporary times is deployed. Since paper-and-pencil assessments validity and integrity of candidate achievement has mattered. Assessments have relied on expert judgment. With the massification of education, formal group ...

  3. Formative assessment: A systematic review of critical teacher

    1. Introduction. Using assessment for a formative purpose is intended to guide students' learning processes and improve students' learning outcomes (Van der Kleij, Vermeulen, Schildkamp, & Eggen, 2015; Bennett, 2011; Black & Wiliam, 1998).Based on its promising potential for enhancing student learning (Black & Wiliam, 1998), formative assessment has become a "policy pillar of educational ...

  4. Classroom Assessment to Support Teaching and Learning

    Classroom assessment includes both formative assessment, used to adapt instruction and help students to improve, and summative assessment, used to assign grades.These two forms of assessment must be coherently linked through a well-articulated model of learning. Sociocultural theory is an encompassing grand theory that integrates motivation and cognitive development, and it enables the design ...

  5. Full article: A practical approach to assessment for learning and

    Goals and success criteria should be made explicit, students should not be afraid to make 'mistakes', and feedback has to be immediate, based on a sound analysis of the 'evidence' (Cowie et al., Citation 2018; Luna, Citation 2018). In the context of science education, AfL has focused mostly on ways to monitor conceptual understanding, as scientific concepts are often complex and ...

  6. Formative assessment and feedback for learning in higher education: A

    INTRODUCTION. Formative assessment and feedback are fundamental aspects of learning. In higher education (HE), both topics have received considerable attention in recent years with proponents linking assessment and feedback—and strategies for these—to educational, social, psychological and employability benefits (Gaynor, 2020; Jonsson, 2013; van der Schaaf et al., 2013).

  7. Assessing the Assessment: Evidence of Reliability and Validity in the

    The Standards for Educational and Psychological Testing (AERA et al., 2014) focuses on the validity of specific assessments, but it is not clear how to apply standards for validity to evidence that is aggregated across 27 different assessments. While the basic assessment structure may be the same, each subject within the edTPA program ...

  8. PDF Assessment, Student Learning and Classroom Practice: A Review

    Abstract. Assessment in its various forms has always been a central part of educational practice. Evidence gleaned from the empirical literature suggests that assessment, especially high stakes external assessment has effect on how teachers teach and consequently, how students learn. Through focus group discussions, this paper draws upon the ...

  9. Authentic Assessment

    Introduction. The term "authentic assessment" was first coined in 1989 by Grant Wiggins in K‒12 educational contexts. According to Wiggins (1989, p. 703), authentic assessment is "a true test" of intellectual achievement or ability because it requires students to demonstrate their deep understanding, higher-order thinking, and complex problem solving through the performance of ...

  10. PDF FUTURE OF TESTING IN EDUCATION Effective and Equitable Assessment Systems

    ments are harmful: an idea for what role all assessments should play in education and the federal and state policy structure needed to make this a reality. Assessments—in particular, one annual standardized assessment of all public school students in reading and math—became the law of the land starting in 2001 with the

  11. Full article: Diversifying assessment methods: Barriers, benefits and

    The 'micro-, meso-, macro-level of an educational system should be aligned to develop clear goals and reference points to guide innovative assessment' (Kapsalis et al., 2019, p. 96). The barriers, benefits and enablers highlighted in this study, the authors hope, should guide this diversification.

  12. Home

    Overview. Educational Assessment, Evaluation and Accountability is an international journal focused on advancing knowledge in assessment, evaluation, and accountability across all educational fields. Connects research, policy-making, and practice. Explores and discusses theories, roles, impacts, and methods of evaluation, assessment, and ...

  13. Full article: Rethinking validity in educational assessment

    In the second article, Panadero et al., ( 2020) investigate the effects of four factors on self-assessment and self-efficacy, more specifically, feedback, subject matter (Spanish vs mathematics), year level and gender. Using a mixed method design, with 64 secondary education students in Spain, data was collected through think-aloud protocols ...

  14. (PDF) ASSESSMENT AND EVALUATION IN EDUCATION

    The purpose of assessment is formative, i.e. to increase quality whereas. evaluation is all about judging quality, therefore the purpose is summative. 5. Assessment is concerned with process ...

  15. Assessment in Education: Principles, Policy & Practice

    Assessment in Education provides a focus for scholarly output in the field of assessment. The journal is explicitly international in focus and encourages contributions from a wide range of assessment systems and cultures. The journal's intention is to explore both commonalities and differences in policy and practice. Peer Review Policy:

  16. Assessment and Technology: Mapping Future Directions in the Early

    The framework and tools used for classroom assessment can have significant impacts on teacher practices and student achievement. Getting assessment right is an important component in creating positive learning experiences and academic success. Recent government reports (e.g., United States, Australia) call for the development of systems that use new technologies to make educational assessment ...

  17. Formative vs. summative assessment: impacts on academic motivation

    Ozan C, Kıncal RY. The effects of formative assessment on academic achievement, attitudes toward the lesson, and self-regulation skills. Educational Sciences: Theory and Practice. 2018; 18:85-118. [Google Scholar] Palomba CA, Banta TW. Assessment essentials: Planning, implementing, and improving assessment in higher education.

  18. (PDF) Educational Assessment: overview

    Abstract. Abstract This article provides a synoptic overview of the articles in the Educational Assessment section of this Encyclopedia. It distinguishes educational assessment from psychometric ...

  19. Journals in Assessment, Evaluation, Measurement, Psychometrics and

    International Journal of Educational and Psychological Assessment. Time-Taylor. Articles that tackle empirical reports, scholarly reviews, and academic essays within the domain of education and psychological assessment, measurement, and evaluation. Varied use of methodologies, educational levels, and approaches to assessment are acceptable.

  20. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions. Advanced search. Find articles. with all of the words. with the exact phrase. with at least one of the words. without the ...

  21. New Trends in Formative-Summative Evaluations for Adult Education

    This study aimed to identify new trends in adult education formative-summative evaluations. Data were collected from multiple peer-reviewed sources in a comprehensive literature review covering the period from January 2014 to March 2019. A total of 22 peer-reviewed studies were included in this study.

  22. Education Sciences

    The second step in creating a program like the ILCP requires consideration of the EPP's current culture of assessment. Consistent with how data are often required by accreditors at various levels (e.g., institution-wide, program-level, general education, etc.), the EPP's academic program assessment process consists of three tiers.

  23. Educational Assessment: Vol 28, No 4 (Current issue)

    Educational Assessment, Volume 28, Issue 4 (2023) See all volumes and issues. Volume 28, 2023 Vol 27, 2022 Vol 26, 2021 Vol 25, 2020 Vol 24, 2019 Vol 23, 2018 Vol 22, 2017 Vol 21, 2016 Vol 20, 2015 Vol 19, 2014 Vol 18, 2013 Vol 17, 2012 Vol 16, 2011 Vol 15, 2010 Vol 14, 2009 Vol 13, 2008 Vol 12, 2007 Vol 11, 2006 Vol 10, 2005 Vol 9, 2004 Vol 8 ...

  24. Comparison of self-efficacy among graduate teaching assistants before

    Background Teaching assistants (TAs) play a crucial role in pedagogical practices, and the TA training has emerged as a vital strategy for enhancing teaching quality and fostering effective interactions. The self-efficacy of TAs can substantially impact their performance. Nevertheless, little research has focused on the change in TAs' self-efficacy following their training. Methods A self ...

  25. Evaluation of an Educational Scholarship Fellowship Program ...

    Introduction Historically, the requirement to produce scholarship for advancement has challenged health professions educators heavily engaged in teaching. As biomedical scientists or healthcare practitioners, few are trained in educational scholarship, and related faculty development varies in scope and quality across institutions. Currently, there is a need for faculty development and ...

  26. Relationship between learning styles and clinical competency in nursing

    The acquisition of clinical competence is considered the ultimate goal of nursing education programs. This study explored the relationship between learning styles and clinical competency in undergraduate nursing students. A descriptive-correlational study was conducted in 2023 with 276 nursing students from the second to sixth semesters at Abhar School of Nursing, Zanjan University of Medical ...

  27. Columbia University makes all classes, exams remote

    Columbia University announced on Thursday all classes and final exams were going remote for the rest of the academic year due to an "evolving" environment. "In order to address the concerns ...

  28. A substance use disorder training curriculum for internal medicine

    The objective of this study was to evaluate a brief SUD curriculum for IM residents, using resident-empaneled patients as an engaging educational strategy. Following a needs assessment, a 2-part SUD curriculum was developed for IM residents at the University of Chicago during the 2018-2019 academic year as part of the ambulatory curriculum.

  29. Where College Protesters Against Israel's War in Gaza Have Been

    Anna Betts reports on national events, including politics, education, and natural or man-made disasters, among other things. More about Anna Betts Colbi Edmonds writes about the environment ...

  30. Health Professionals Organizing for Climate Action: A Novel

    a Harvard Medical School Teaching Hospital, developed a novel, longitudinal fellowship that taught health professionals about health and health equity effects of climate change, as well as community organizing practices that may help them mitigate these effects. The fellowship cohort included 40 fellows organized into 12 teams and was conducted from January to June 2022. Each team developed a ...