history of grading system in education

What Is the History of Grading?

Christine Lee

Assessment and grading are undoubtedly more challenging online vs. in-person, but the pedagogical principles remain the same. In fact, the setbacks we face today are an opportunity for innovation. In the case of online grading, we have the opportunity to take a hard look at the value of grades and prioritize equity. In sum, there has always been room for improvement to reach the ideals of pedagogical best practices and excellence.

So what can grading with equity look like in online learning environments?

history of grading system in education

Formative feedback is maximized when it occurs within the writing process as opposed to solely at the point of assessment. When inserted into the writing process, students can reflect and then apply these concepts in revision and subsequent assignments, enacting a feedback loop, separate from the pressure of final assessment.

By completing this form, you agree to Turnitin's Privacy Policy . Turnitin uses the information you provide to contact you with relevant information. You may unsubscribe from these communications at any time.

We talk a lot about assessment, feedback, and grading as core to student learning and teaching efficacy. And why not? It’s a critical intersection that ought to be discussed and examined for continuous improvement.

It’s also important to examine how it is we got here, and learn from where our modern grading comes as we innovate. Additionally, it’s important to note too, the reasons behind the current grading systems.

Assessment isn’t new; it has been around for centuries because education has always been rooted in the knowledge exchange between students and instructors. The goal of assessment is to improve student learning by systematically examining student learning patterns to inform future teaching and learning.

Grading, a subset of assessment, focuses on measuring individual student learning. Letter grades, evaluative and numerical grading, and grading on a curve are all relatively new in the realm of education because the concept of grading is a recent practice. Letter grades were not in widespread use until the 1940s. And in fact, even in 1971, only 67% of primary and secondary schools in the United States used letter grades.

So how did we get here? And why?

Letter and numerical grades were absent from the beginnings of student evaluations. The ancient Greeks used assessments as formative and not evaluative learning tools. Harvard required exit exams in 1646 to attain a degree. And in 1785, Yale president Ezra Stiles implemented the first grading scale in the United States based on four descriptions: Optimi, Second Optimi, Inferiores, and Perjores. Other universities like William and Mary followed similar approaches in 1817 ( Durm, 1993 ). These grading systems appeared in conjunction with the UK--researchers surmised that “it appears that educators like Stiles were mimicking a classification scheme best exemplified by the Cambridge Mathematical Tripos examination,” which evaluated student learning ( Schneider & Hutt, 2014 ), making these systems a global phenomenon.

However, even though these schools had a marking system, many hid these marks from students so as to discourage a competitive environment that would distract students from learning ( Schinske & Tanner, 2014 ).

Pedagogical figures such as Horace Mann worried about the message of competition within grading sent to students and its effect on student learning and intellectual development. Mann wrote in his ninth annual report, “if superior rank at recitation be the object, then, as soon as that superiority is obtained, the spring of desire and of effort for that occasion relaxes,” adding that students might prioritize exam outcomes “as to incur moral hazards and delinquencies” ( Mann, 1846 pp. 504-505 ). The debate around the merits of grading has been around for as long as grading has existed.

Even so, grading moved from its holistic origins to a more standardized, objective, and scale-based approach in the early 1900s as U.S. education expanded and tripled in size due largely to compulsory K-12 education. Consequently, a need for a unified system prioritized standardized and efficient communication between academic institutions. Grades could no longer be specific to an individual school or university. Grades could no longer be specific to an individual student but needed to have meaning to third parties. And grades became widespread.

By the 1940s, the A-F grading system emerged as “the dominant grading scheme, along with two other systems that would eventually be fused together with it: the 4.0 scale and the 100 percent system” ( Schneider & Hutt, 2014 ). Numerical and letter grades were here to stay--and what followed was grading on the curve and setting grades relative to the grades of cohorts, especially popular in large introductory STEM courses in higher education. In fact, grading on the curve aimed to minimize the subjective nature of grading ( Guskey, 1994 ). What we know now is that grading on a curve, too, increases competition between students and may unfairly reward those engaged in academic misconduct.

Our current A-F or numerical grading system is founded on streamlining communication between academic institutions and not so much on improving student learning. While grades can motivate high-achieving students (how many educators have received re-grade requests?), there remains a need for improvement. When students are focused on grades rather than on the actual learning experience--it leaves them at risk for short-cut solutions. Furthermore, providing grades without feedback can be detrimental to student motivation, according to a recent article by Tim Klein in EdSurge . Grades alone do not advance student learning. Ellis Page’s research states that “grades can have a beneficial effect on student learning, but only when accompanied by specific or individualized comments from the teacher” ( Guskey, 1994 ).

Grading is a way to communicate information with great efficiency--but the information is by nature, incomplete. According to Schneider and Hutt’s article Making the Grade: A History of the A-F Marking Scheme:

“In order for grades to be useful as tools for systemic communication––allowing for national movement, seamless coordination, and seemingly standard communication to parents and outsiders––they had to be simple and easy to digest. Yet that set of characteristics often conflicts with learning because the outcomes of learning are inherently complicated and messy. Consequently, while grades sometimes promote learning, they often promote an entirely separate set of behaviours” ( Schneider & Hutt, 2014 ).

The innovation will occur in the space where educators are optimized to operate, where the two forces have to be reconciled--in the classroom, virtual, or in-person. The grading system is essential for coordination and communication to third parties--but it must also focus on student learning.

So, how can educators uphold learning within the current grading construct?

Feedback loops are critical to knowledge exchange , both for high-achieving students and for those who are struggling. Without feedback loops, the assessment intersection becomes simply evaluative and the chance for student growth diminishes. Tools like Draft Coach , Feedback Studio , and Gradescope enable feedback loops throughout the student and instructor workflows. For instance, Draft Coach takes a formative approach as students work on writing assignments. Feedback Studio enables teachers and students to exchange feedback on writing assignments. And Gradescope activates feedback loops in assessment and grading so that students absorb the knowledge needed to move forward in learning.

Taking a moment to reflect shows us that grading has and can evolve. Much of education is faced with the challenges of online grading and online assessment--and the roadblocks we face. But ultimately, this can lead to positive change and further evolution of grading and assessment with integrity while supporting student learning and teaching efficacy.

History of Grading Systems

Nicole lassahn, 26 mar 2022.

Letter grades were first used in the United States in the last part of the 19th century. Both colleges and high schools began replacing other forms of assessment with letter and percentage grades in the early 20th century. While grading systems appear to be fairly standardized in the U.S., debates about grade inflation and the utility of grades for fostering student learning continue.

Explore this article

  • Before Grades
  • Yale University started with 4 ranks.
  • The First Grades
  • Letter Grades
  • 1897 at Mount Holyoke College is the first use of modern letter grades.
  • Early K-12 Grades
  • Grading System Controversies

1 Before Grades

Universities have always evaluated students, but the modern grading system did not always exist. In fact, in the 18th century, there was no standardized means of evaluating students, and certainly no means by which student performance at one institution could be easily compared with student performance somewhere else.

2 Yale University started with 4 ranks.

One of the first instances of an attempt to evaluate students systematically appeared in the diary of Ezra Stiles, who was president of Yale University in the 18th century. In 1785, he divided students who were present for an examination into four ranks or grades:

  • second optimi,
  • and pejores

Latin terms indicating relative quality, best, worse and worst.

3 The First Grades

It was also at Yale University that a system resembling our current grading system was first used. In the first quarter of the 19th century, Yale kept student information in what it called a ​ Book of Averages ​; this book also sometimes discussed rules and procedures for setting down exam results.

The book mentioned the practice of recording an average of each student's marks--a procedure still used in figuring course grades--and also mentioned marking on a 4-point scale.

While there is no mention this early of the letter grades we know today, the 4-point scale is probably the precursor of today's grade point average.

Numerical scales also were used elsewhere, but they varied by institution:

  • College of William & Mary used a 4-point scale, with 1 as the best and 4 as the worst.
  • Harvard College used both a 20-point and a 100-point scale.
  • Yale apparently experimented briefly with a 9-point scale before returning to the 4-point scale.

4 Letter Grades

In the last half of the 19th century, colleges continued to experiment with various scales for evaluating students and also for grouping and classifying them. Some systems functioned by evaluating students individually.

For example, the University of Michigan's marking system in 1895 provided students with one of five marks on exams:

  • incomplete,
  • not passed,
  • conditional

Other systems were attempts to rank or order the entire student body, or all students in a class, by placing them into categories, divisions or percentages, such as Harvard's 1877 system that placed students in one of six "divisions" using a grading scale of 100. Division I was students earning 90 to 100 on the evaluation scale.

These systems might not have averaged student performance to create comparative ranks, what we call grading on a curve.

5 1897 at Mount Holyoke College is the first use of modern letter grades.

It was in 1897 at Mount Holyoke College that letter grades tied to a numerical or percentage scale were first used.

The college awarded students in percentages:

  • 95 to 100 an A,
  • 85 to 94 a B,
  • 76 to 84 a C,
  • 75 a D--the lowest passing grade--
  • and anything below 75 an E, which indicated a failing grade.

Our modern F grade was not used, but this system was the beginning of the relatively standard grades we see today.

6 Early K-12 Grades

It was in the first part of the 20th century that American elementary and high school education also began using standardized grading systems. This period coincided with a substantial increase in the number of students; compulsory-attendance laws had been passed during this period, and the number of public high schools increased from 500 to 10,000 between 1870 and 1910.

These changes made the use of written, descriptive reports less feasible, and high schools increasingly began using both percentage and letter grades to evaluate students.

In 1912, Daniel Starch and Edward Charles Elliott, two researchers from Wisconsin, examined the reliability of percentage grades and found that there was immense variation from teacher to teacher in both the criteria used to assign grades and the grades themselves.

This variation, and the desire for more standard grades, led to an overall move away from point scales with a large range to the smaller types of grade scales we know today.

7 Grading System Controversies

While grade scales in the U.S. are fairly standard, debates and questions about grading continue today. There are similar questions about variability, because grading can be a subjective process, as well as more philosophical questions about the relationship of grades to learning.

Finally, even the grade scale itself is not exactly the same at all schools. One of the largest concerns about variability is ​ grade inflation ​, the phenomenon in which average grades at private schools are higher than at public schools.

While some claim that this discrepancy is caused by private schools' greater selectivity in admissions, implying the student body is smarter at private schools, data collected by Stuart Rojstaczer show that even when schools have the same degree of selectivity, ​ private schools have higher grade point averages than public schools ​.

Faculty members such as Harvey C. Mansfield have publicly complained about the pressure to raise grades beyond what is deserved. One reason for grade inflation is probably pressure from students who are concerned about their grades and their future career prospects.

Educators worry that grades can make students more focused on credentials and less on actual learning.

It is also the case that grades can take the place of more substantive and individualized assessments; there are many ways of diagnosing whether students are learning, and grades are not always the best method.

There is also some debate about whether the practice of grading on a curve is useful in fostering or assessing student learning.

Finally, while it is true that a standardized grading scale can be necessary in a world in which students move from school to school and state to state, our grading scales are not as standardized as we think.

In addition to variations in grade inflation, meaning the same student might receive different grades at different institutions, schools also vary in their use of the plus and minus system, and some use a point system rather than letter grades.

  • 1 "An A Is Not An A Is Not An A: A History of Grading"; The Educational Forum; Spring 1993
  • 2 Grading Systems--School, Higher Education

About the Author

Nicole Lassahn earned her Ph.D. in comparative literature in 2001 from the University of Chicago. Since then, she has worked as assistant director of the University of Chicago Academic and Professional Writing Program, and as assistant dean of the Graduate School at Loyola University Chicago.

Related Articles

The Grading System in U.S. Colleges

The Grading System in U.S. Colleges

What Is the Difference Between QPA and GPA?

What Is the Difference Between QPA and GPA?

Grading System Conversions

Grading System Conversions

How Accurate is the STAR Math Assessment?

How Accurate is the STAR Math Assessment?

Pros & Cons of a Pass/Fail Grading System

Pros & Cons of a Pass/Fail Grading System

The Pros & Cons of Report Cards & Letter Grades

The Pros & Cons of Report Cards & Letter Grades

How to Convert Grade Level to a Degrees of Reading Power Test Score

How to Convert Grade Level to a Degrees of Reading...

Statistics on Why College Students Decide to Cheat

Statistics on Why College Students Decide to Cheat

How to Interpret the Shapiro-Wilk Test

How to Interpret the Shapiro-Wilk Test

The Freezing Point of 80 Proof Whiskey

The Freezing Point of 80 Proof Whiskey

The Advantages of a Pass/Fail Grading System

The Advantages of a Pass/Fail Grading System

Higher Order Level Thinking Skills in Math Grade 5

Higher Order Level Thinking Skills in Math Grade 5

Definition of Assessment of Learning

Definition of Assessment of Learning

How to Calculate PSAT Scores

How to Calculate PSAT Scores

How to Score the Virginia SOL

How to Score the Virginia SOL

What Do SSAT Scores Mean?

What Do SSAT Scores Mean?

Types of Grading Systems

Types of Grading Systems

What Is a Criterion-Referenced Test?

What Is a Criterion-Referenced Test?

Topics for a Compare & Contrast Essay on Education

Topics for a Compare & Contrast Essay on Education

How to Calculate Completion Rates

How to Calculate Completion Rates

Regardless of how old we are, we never stop learning. Classroom is the educational resource for people of all ages. Whether you’re studying times tables or applying to college, Classroom has the answers.

  • Accessibility
  • Terms of Use
  • Privacy Policy
  • Copyright Policy
  • Manage Preferences

© 2020 Leaf Group Ltd. / Leaf Group Media, All Rights Reserved. Based on the Word Net lexical database for the English Language. See disclaimer .

A Brief History of Grades and Gradeless Learning

The incredibly named I.E. Finklestein once said,

When we consider the practically universal use in all educational institutions of a system of marks, whether numbers or letters, to indicate scholastic attainment of the pupils or students in these institutions, and when we remember how very great stress is laid by teachers and pupils alike upon these marks as real measures or indicators of attainment, we can but be astonished at the blind faith that has been felt in the reliability of the marking system. School administrators have been using with confidence an absolutely uncalibrated instrument... What faults appear in the marking system that we are now using, and how can these be avoided or minimized?

And that was in 1913. Maybe it’s time we reconsider grading.

The American grading system  had its roots in the mid 1800s, when Yale and Harvard experimented with different points, percentage, and other metric systems. By 1897, the relatively small women’s college, Mount Holyoke, combined these ideas into the modern letter grade scale - a 4.0 is an A, 3.0 is a B, so on and so forth.

Of course, there is an entire history of arguments and reforms leading up to this point. In 1846, an early adopter of standardization and proponent of public education, Horace Mann, expressed concerns that students would be too focused on class rank and may, “incur moral hazards and delinquencies” as they chased extrinsic motivation. His solution was to show progress of a student overtime through monthly report cards, as to show growth and development. In general, early reformers saw report cards as a way to inspire intrinsic motivation while still tracking a student’s progress.

As the school system rapidly expanded in the late 19th century, becoming compulsory in almost every state, it became obvious to teachers and administrators that as class sizes grew larger, they needed an efficient means to communicate a child’s knowledge. A shift began from detailed feedback -- what could be seen as essentially a “growth mindset” model -- to that of rote assessment. This was especially the case after the creation of the College Entrance Examination Board in 1899 (which is the College Board of today.) The push was to standardize grades so that colleges would not only value class rank, but have the same universal scoring: an A was the same at one school as it was another. This led to more standardization through curriculum, scheduling, and general school culture.

There was a lot of confusion and uncertainty. Tuition fees began to be covered for high ranked and straight-A students, but if students are taking different classes...how can colleges measure student class rank efficiently? A solution was attempted through Harvard’s Book of Comparative Merit , detailing the value of every course that may be offered, but ultimately was rejected as it really made no sense (class descriptions, concepts, and difficulty still varied drastically, no matter the course title). This concept is still attempted today through the 5.0 scale for AP classes and promotion of STEM courses, with relative disregard for the arts.

history of grading system in education

Soon after in the time of Finklestein, the critic of grading mentioned earlier in the early 1900s, a divide emerged between those doubling down on the standardization of grading and those rejecting the practice outright.

In 1911, an observer at the University of Missouri wrote, “the grade has in more than one sense a cash value, and if there is no uniformity of grading in an institution, this means directly that values are stolen from some and undeservedly presented to others”

Whereas the economist Thorsten Velben stated in 1918, the “system of academic grading and credit... resistlessly bends more and more of current instruction to its mechanical tests and progressively sterilizes all personal initiative and ambition that comes within its sweep.”

And a teacher in 1935 said that those who rejected report cards are a  “challenge from a group of young crusaders who have chosen to be known as the Intrinsic Clan against the entire family whose surname is Extrinsic”

Many of these reformist mindsets were influenced by John Dewey, who published a variety of works on progressive and experiential education in the early 1900s.

As is obvious today, the graders won. Not only did teachers see them as a valuable motivator, but when the school system kept rapidly expanding and more universities were built -- grades were the most practical way of communication. V.L. Beggs, a critic of report cards especially for elementary school, reminisced in 1936 that too many people “conclude that the school’s most important contribution to the child’s education is recorded on the card’.”

At the same time, the use of the IQ test in the military ( which corresponded with the growth of the eugenics movement and at around the same time, led to the establishment of the SAT ) prompted proponents of standardization to find even more ways to attempt objective grading. They sought to measure intelligence based on mathematical principles and racist scientific observations through eugenics. Again, educators pushed back against the “objectiveness” of grading -- but as time went on, many accepted grading as the norm but argued the “best effective grading method.” A small group of teachers in 1936 attempted a “narrative letter” -- where they sat down together and wrote a letter on each student’s progress to parents which was formatted with a standardized form. Interestingly, they later remarked these letters would ‘become as meaningless and as stereotyped as the subject marks they replaced’, going as far to say that these letters would become  ‘false standards of value among our pupils.’ (Geyer 1938: 531)

Through the 1960s, most teachers and school districts felt beholden to the grading system and college admissions process. But by now, more and more critics entered the system, ranging from the old arguments of restoring intrinsic motivation, to restoring student well-being as young people were increasingly anxious and disparaged about doing poorly.

At this point, it’s come full circle. We’re faced with the challenge of replacing the grading system, waxing and waning from feedback to “objective grades” decade after decade. However, what about the historical problems that were faced when teachers attempted to not use grades? How will we compare what one student knows to another? After all, colleges are set up to analyze marks. And then, what about teachers? How can they possibly grade and leave substantiated feedback to all their students when class sizes are so large?

The debate of the last 150 years is essentially the same as today: do we accept grading as a measurement of student learning and find ways to make it increasingly objective, or do we find a way to communicate knowledge without ever assigning a grade? It’s interesting to note how recent this system is. Not only was it quickly developed in a relatively short time frame, but almost the entirety of assessment and college admissions is defined by practices of the 2nd Industrial Revolution, when one room school buildings were the norm and many children would hike 2 miles to school. Its newness is part of the reason why this discussion is so important. The sooner an institutionalized change is made, the easier it is to deconstruct, question, and make anew.

Crafting a New System

It is my view that grading is a practice that hurts children: it makes them demotivated if they’re doing poorly and motivates toward an extrinsic goal other than learning. It seems as if we succeed in spite of how grades are given, rather than achieving because of high grades.

It’s a situation that’s happened in my classroom, prior to adopting gradeless learning, at many times. I hand a student extensive project feedback with myriad notes, but they see a “B”, put it away in their backpack, and that’s the end of it. If a student gets a “D” and they worked hard...they just give up. And if they’re used to low grades they make an aside about the uselessness of school and never push to succeed. Then there’s the student who always gets A’s and goes into a panic if their grade is anything but. None of these situations have anything at all to do with learning: no one is asking what they did well or how to get better, no one really cares about the content. They just care about the mark; extrinsic motivation for a reward. Our goal should be to restore intrinsic motivation where we learn just to learn, what humans naturally do and what inspires a lifelong curiosity and love of learning.

In excess, extrinsic motivators cause harm. Many children love to read but the second a new reading program provides them with pizza for reading 5 books and taking a quiz, they become purely focused on eating at Pizza Hut than caring about what they read. Students have a variety of interests as they enter school, but overtime lose focus on asking and exploring anything outside of what’s on the test -- after all, there’s no grade associated with it.

There’s so much research that supports this notion that it’s simply nauseating. Although this research has its faults (much of it is measured through equally subjective test scores), the data on grading highlights how ineffective it really is.

Here’s some studies from the last few decades:

  • In 1987, and then followed up in 1988, Ruth Butler researched that students had by far the highest motivation to learn when they were only given feedback for their assessments, followed by being given feedback with a grade, with the lowest motivation being for only a grade.
  • In 1991, Hall Beck found that students who have a “grading orientation”, which means they care primarily about their grades,actually had lower GPAs and more importantly worse emotional and social well-being than those with a “learning orientation” who simply wanted to learn new things.
  • Aleidine Moeller in 1993 found that when taking a test, students were not any more motivated for that test when they knew grades were involved.
  • In 1996, John Krumbultz outlined how the competition that arises from grading practices made students lose their sense of safety in the classroom, which is widely regarded as the most basic need for learning.
  • Eric Anderman in 1997 researched that when students saw a test as a measurement for incentives and stressful, that they were more likely to cheat than those who saw it as a method for improvement.
  • Gary Natriello found in 1998 that there was an obvious correlation between those who scored low grades and those who stayed in school.
  • In 2008, Anastasia Lipnevich replicated Butler’s study and found that feedback without a grade had the highest motivation for student success.
  • In 2011, Caroline Pulfrey performed multiple experiments to reveal that students did not try as hard, nor were motivated as much, by assignments that had grades.
  • Carine Souchal found in 2014 that competitive grading practices impacted the psyche of girl students in science classes, as historically those tests had been favored toward male students (this phenomena is explained perfectly in Whistling Vivaldi by Claude Steele).
  • In 2015, Anne-Sophie Hayek revealed that when working in a group project, groups were less cooperative and not willing to share as much when they knew they would receive a contribution grade.
  • Also in 2015, Astrid Poorthuis found that obviously students who receive a low grade on their halfway report card had much decreased grades and performance for their final report card.

These are just some studies of 100s that exist, including many from the 1800s and early 1900s, and the same exact themes come across in each: grades hurt a student’s interest in learning and only seem to, at best, motivate high-achieving students for the wrong reasons. If our goal is to engage every student and make them better at what they do, then it seems completely counterintuitive to issue them a grade telling them they’re not doing well -- as it will completely destroy any element for them wanting to do better. In other words, the system works to make the gap larger between those who perform well and those who don’t. Those who don’t fall further behind as they’re increasingly demotivated toward school work and those who consistently do well stay at the top.

And it’s not that those doing well are necessarily “doing well”, they may have high grades but increasing anxiety rates, low social/emotional well-being, and lack of interests outside of schoolwork. The typical “all A” student is completely overtaken by the system: they do what they’re told, meet the parameters, and follow an over-the-top schedule for means of college admission. They no longer have a clue to why they’re doing what they’re doing and as a result, many of these students are losing their imagination and lack overall purpose. Most importantly, they’re simply not happy.

Schools are places of learning. It shouldn’t be a charged statement to say that students should want to learn at school. We know that kids want to learn...any small child can’t wait to explore anything they can get their hands on: whether that be books, experiments, asking questions, or playing with their friends. But the longer they’re in school, the more they go through the motions -- and grades play a big part in that. Of course, I don’t want to imply that older students aren’t interested in anything at all. Many young people will subject themselves to hours of YouTube vlogs, which are essentially hour-long lectures, to learn about something that interests them. And grades are the sole issue, but they are an issue that can be changed.

*Historical quotes and information is accessed from Making the grade: a history of the A–F marking scheme by Jack Schneider & Ethan Hunt (2013) .

history of grading system in education

read this next

history of grading system in education

take a listen

history of grading system in education

Join the Movement

Stay informed.

I think when we were talking about what it gives us time to do, folks are able to get up and learn to paint, play guitar. What have folks been learning. This time isn't lost. Learning looks differently but I think we can lean into that. Who are the transmitters of knowledge? What does knowledge transmission look like? How do we constitute learning and what do those content areas look like in how we might think of things as more of...someone might call it transdisciplinary. So those are just what I was thinking but I think it's certainly a time. And we should, at least I'll say in my opinion, not think of this so much as a learning loss. It doesn't value what families and communities have been doing during this time and only puts out how knowledge translates in this academic sense and these structures that aren't always as flexible as we might add.

Where's this from?

history of grading system in education

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • CBE Life Sci Educ
  • v.13(2); Summer 2014

Teaching More by Grading Less (or Differently)

Jeffrey schinske.

*Department of Biology, De Anza College, Cupertino, CA 95014

Kimberly Tanner

† Department of Biology, San Francisco State University, San Francisco, CA 94132

The authors explore a history of grading and review the literature regarding the purposes and impacts of grading. They then suggest strategies for making grading more supportive of learning, including balancing accuracy-based and effort-based grading, using self/peer evaluation, curtailing curved grading, and exercising skepticism about the meaning of grades.

INTRODUCTION

When we consider the practically universal use in all educational institutions of a system of marks, whether numbers or letters, to indicate scholastic attainment of the pupils or students in these institutions, and when we remember how very great stress is laid by teachers and pupils alike upon these marks as real measures or indicators of attainment, we can but be astonished at the blind faith that has been felt in the reliability of the marking systems. —I. E. Finkelstein (1913)

If your current professional position involves teaching in a formal classroom setting, you are likely familiar with the process of assigning final course grades. Last time you assigned grades, did you assign an “E,” “E+,” or “E−” to any of your students? Likely you assigned variations on “A’s,” “B’s,” “C’s,” “D’s,” and “F’s.” Have you wondered what happened to the “E’s” or talked with colleagues about their mysterious absence from the grading lexicon? While we often commiserate about the process of assigning grades, which may be as stressful for instructors as for students, the lack of conversation among instructors about the mysterious omission of the “E” is but one indicator of the many tacit assumptions we all make about the processes of grading in higher education. Given that the time and stress associated with grading has the potential to distract instructors from other, more meaningful aspects of teaching and learning, it is perhaps time to begin scrutinizing our tacit assumptions surrounding grading. Below, we explore a brief history of grading in higher education in the United States. This is followed by considerations of the potential purposes of grading and insights from research literature that has explored the influence of grading on teaching and learning. In particular, does grading provide feedback for students that can promote learning? How might grades motivate struggling students? What are the origins of norm-referenced grading—also known as curving? And, finally, to what extent does grading provide reliable information about student learning and mastery of concepts? We end by offering four potential adjustments to our general approach to grading in undergraduate science courses for instructors to consider.

A BRIEF HISTORY OF GRADING IN HIGHER EDUCATION

It can be easy to perceive grades as both fixed and inevitable—without origin or evolution … Yet grades have not always been a part of education in the United States. — Schneider and Hutt (2013 )

Surprisingly, the letter grades most of us take for granted did not gain widespread popularity until the 1940s. Even as late as 1971, only 67% of primary and secondary schools in the United States used letter grades ( National Education Association, 1971 ). It is therefore helpful to contextualize the subject to appreciate the relatively young and constantly changing nature of current systems of grading. While not an exhaustive history, the sections below describe some of the main developments leading to the current dominant grading system.

Early 19th Century and Before

The earliest forms of grading consisted of exit exams before awarding of a degree, as seen at Harvard as early as 1646 ( Smallwood, 1935 ). Some schools also awarded medals based on competitions among students or held regular competitions to assign seats in class ( Cureton, 1971 ). Given that universities like Yale and Harvard conducted examinations and elected valedictorians and salutatorians early in the 18th century, some scale of grading must have existed. However, the first official record of a grading system surfaces in 1785 at Yale, where seniors were graded into four categories: Optimi , second Optimi , Inferiores , and Perjores ( Stiles, 1901 , cited by Smallwood, 1935 ). By 1837, Yale was also recording student credit for individual classes, not just at the completion of college studies, using a four-point scale. However, these “merit marks” were written in code and hidden from students ( Bagg, 1871 ).

Harvard and other schools soon experimented with public rankings and evaluations, noting that this resulted in “increasing [student] attention to the course of studies” and encouraged “good moral conduct” ( Harvard University, 1832 ). Concerned that such public notices would inspire competition among students, which would distract from learning, other schools used more frequent, lower-stakes “report cards” to provide feedback on achievement ( Schneider and Hutt, 2013 ). In 1837, at least some professors at Harvard were grading using a 100-point system ( Smallwood, 1935 ). During this same period, William and Mary placed students in categories based on attendance and conduct. The University of Michigan experimented with a variety of grading systems in the 1850s and 1860s, including various numeric and pass/fail systems ( Smallwood, 1935 ). Still, many schools at this time kept no formal records of grades ( Schneider and Hutt, 2013 ).

Late 19th Century and 20th Century

With schools growing rapidly in size and number and coordination between schools becoming more important, grades became one of the primary means of communication between institutions ( Schneider and Hutt, 2013 ). This meant grades needed to have meaning not just within an institution but also to distant third parties. A record from 1883 indicates a student at Harvard received a “B,” and in 1884, Mount Holyoke was grading on a system including “A,” “B,” “C,” “D,” and “E.” Each letter corresponded to a range of percentage scores, with lower than 75% equating to an “E” and indicating failure. Mount Holyoke added an “F” grade (for failing) to the scale in 1898 and adjusted the percentages relating to the other letters ( Smallwood, 1935 ). This appears to be the initial origin of the “A”–“F” system familiar to most faculty members today, albeit including an “E” grade. By 1890, the “A”–“E” system had spread to Harvard after faculty members expressed concerns regarding reliably grading students on a 100-point scale. Still, grading was not always done at schools and grading systems varied widely ( Schneider and Hutt, 2013 ).

By the early 1900s, 100-point or percentage-based grading systems were very common ( Cureton, 1971 ). This period also saw an increased desire for uniformity in grading, and many expressed concerns about what grades meant from one teacher or institution to the next ( Weld, 1917 ). Numerous studies of the period sought to understand and perfect grading systems ( Cureton, 1971 ). Grading on a 100-point scale was found to be highly unreliable, with different teachers unable to assign consistent grades on papers in English, math, and history ( Starch, 1913 ). Researchers felt that getting away from a 100-point scale and grading into only five categories (e.g., letter grades) could increase reliability ( Finkelstein, 1913 , p. 18). While it is unclear exactly when and why “E” grades disappeared from the letter grade scale, it seems possible that this push to use fewer categories resulted in an “A”–“F” scale with no “E” (“F” being retained, since it so clearly stood for “fail”). Others have conjectured that “E” was removed so students would not assume “E” stood for “excellent,” but whatever the reason, “E’s” apparently disappeared by the 1930s ( Palmer, 2010 ).

As research on intellectual ability appeared to show that, like other continuous biological traits, levels of aptitude in a population conformed to a normal curve, some experts felt grades should similarly be distributed according to a curve in a classroom ( Finkelstein, 1913 ). Distributing grades according to a normal curve was therefore considered as a solution to the subjective nature of grading and a way to minimize interrater differences in grading ( Guskey, 1994 ). Others worried that measuring aptitude was different from measuring levels of classroom performance, which might not be normally distributed ( Schneider and Hutt, 2013 ).

Based on the above research and the pressure toward uniformity of grading systems, by the 1940s the “A”–“F” grading system was dominant, with the four-point scale and percentages still also in use ( Schneider and Hutt, 2013 ). However, many inconsistencies remained. As one example, Yale used no less than four different grading systems from the 1960s to 1980s ( Yale University, 2013 ).

Present Day

Grading systems remain controversial and hotly debated today ( Jaschik, 2009 ). Some argue grades are psychologically harmful ( Kohn, 1999 ). Others raise concerns about the integrity of the “A”–“F” system, given well-documented trends in grade inflation ( Rojstaczer and Healy, 2012 ). One professor summed it up by saying grades do no more than “create a facade of coherence” ( Jaschik, 2009 ). A number of colleges have abandoned numerical and categorical grading altogether, opting instead for creating contracts with students to define success or employing student self-reflection in combination with written evaluations by faculty ( Jaschik, 2009 ). Among the Ivy League schools, Brown University does not calculate grade point averages, does not use “D’s” in its grading scale, and does not record failing grades ( Brown University, 2014 ). Even Yale, the institution that started this history of grading more than 200 yr ago, is today still considering changes to its grading system ( Yale University, 2013 ).

Though grades were initially meant to serve various pedagogical purposes, more recent reforms have focused on “grades as useful tools in an organizational rather than pedagogical enterprise—tools that would facilitate movement, communication, and coordination” ( Schneider and Hutt, 2013 ). So, what are the potential purposes of grading in educational settings?

PURPOSES OF GRADING—PAST AND PRESENT

Grades as feedback on performance—does grading provide feedback to help students understand and improve upon their deficiencies.

[This] work affirms an observation that many classroom teachers have made about their students: if a paper is returned with both a grade and a comment, many students will pay attention to the grade and ignore the comment. — Brookhart (2008 , p. 8)

For most faculty members, the concept of feedback has at least two applications to the concept of grading. On one hand, grading itself is a form of feedback that may be useful to students. In addition, in the process of grading student work, faculty members sometimes provide written comments as feedback that students could use to improve their work. Because college students express a desire for feedback ( Higgins et al. , 2002 ), faculty members may feel pressured to grade more (rather than facilitating ungraded activities) and to provide more written feedback while grading. Especially in large classes, this can significantly increase workload on faculty ( Nicol and Macfarlane-Dick, 2006 ; Crisp, 2007 ). But are grades and written comments effective forms of feedback that assist students in achieving conceptual mastery of the subject?

Feedback is generally divided into two categories: evaluative feedback and descriptive feedback. Evaluative feedback, such as a letter grade or written praise or criticism, judges student work, while descriptive feedback provides information about how a student can become more competent ( Brookhart, 2008 , p. 26). Butler and Nisan (1986) compared the impacts of evaluative feedback, descriptive feedback, and no feedback on student achievement in problem-solving tasks and in “quantitative” tasks (e.g., those requiring quick, timed work to produce a large number of answers). They found that students receiving descriptive feedback (but not grades) on an initial assignment performed significantly better on follow-up quantitative tasks and problem-solving tasks than did students receiving grades or students receiving no feedback. Students receiving grades performed better on follow-up quantitative tasks than students receiving no feedback, but did not outperform those students on problem-solving assignments. In other words, providing evaluative feedback (in this case, grades) after a task does not appear to enhance students’ future performance in problem solving.

While descriptive, written feedback can enhance student performance on problem-solving tasks; reaping those benefits requires students to read, understand, and use the feedback. Anecdotal accounts, as well as some studies, indicate that many students do not read written feedback, much less use it to improve future work ( MacDonald, 1991 ; Crisp, 2007 ). In one study, less than half of undergraduate medical students even chose to collect the feedback provided on their essays ( Sinclair and Cleland, 2007 ). Other studies suggest that many students do read feedback and consider it carefully but the feedback is written in a way that students do not find useful in improving future work ( Higgins et al. , 2002 ). Some studies have further investigated the relationships between grading and descriptive feedback by providing students with both written feedback and grades on assignments. In these cases, the addition of written comments consistently failed to enhance student performance on follow-up tasks ( Marble et al. , 1978 ; Butler 1988 ; Pulfrey et al. , 2011 ). Brookhart (2008 , p. 8) concludes, “the grade ‘trumps’ the comment” and “comments have the best chance of being read as descriptive if they are not accompanied by a grade.” Even when written feedback is read, there is widespread agreement that instructor feedback is very difficult for students to interpret and convert into improved future performance ( Weaver, 2006 ).

Grading does not appear to provide effective feedback that constructively informs students’ future efforts. This is particularly true for tasks involving problem solving or creativity. Even when grading comes in the form of written comments, it is unclear whether students even read such comments, much less understand and act on them.

Grades as a Motivator of Student Effort—Does Grading Motivate Students to Learn?

Our results suggest…that the information routinely given in schools—that is, grades—may encourage an emphasis on quantitative aspects of learning, depress creativity, foster fear of failure, and undermine interest. — Butler and Nisan (1986 )

As described in the history of grading above, our current “A”–“F” grading system was not designed with the primary intent of motivating students. Rather, it stemmed from efforts to streamline communication between institutions and diminish the impacts of unreliable evaluation of students from teacher to teacher ( Grant and Green, 2013 ). That is not to say, however, that grades do not have an impact on student motivation and effort. At some point, every instructor has likely experienced desperate petitions from students seeking more points—a behavior that seems to speak to an underlying motivation stimulated by the grading process.

It would not be surprising to most faculty members that, rather than stimulating an interest in learning, grades primarily enhance students’ motivation to avoid receiving bad grades ( Butler and Nisan, 1986 ; Butler, 1988 ; Crooks, 1988 ; Pulfrey et al. , 2011 ). Grades appear to play on students’ fears of punishment or shame, or their desires to outcompete peers, as opposed to stimulating interest and enjoyment in learning tasks ( Pulfrey et al. , 2011 ). Grades can dampen existing intrinsic motivation, give rise to extrinsic motivation, enhance fear of failure, reduce interest, decrease enjoyment in class work, increase anxiety, hamper performance on follow-up tasks, stimulate avoidance of challenging tasks, and heighten competitiveness ( Harter, 1978 ; Butler and Nisan, 1986 ; Butler, 1988 ; Crooks, 1988 ; Pulfrey et al. , 2011 ). Even providing encouraging, written notes on graded work does not appear to reduce the negative impacts grading exerts on motivation ( Butler, 1988 ). Rather than seeing low grades as an opportunity to improve themselves, students receiving low scores generally withdraw from class work ( Butler, 1988 ; Guskey, 1994 ). While students often express a desire to be graded, surveys indicate they would prefer descriptive comments to grades as a form of feedback ( Butler and Nisan, 1986 ).

High-achieving students on initial graded assignments appear somewhat sheltered from some of the negative impacts of grades, as they tend to maintain their interest in completing future assignments (presumably in anticipation of receiving additional good grades; Butler, 1988 ). Oettinger (2002) and Grant and Green (2013) looked specifically for positive impacts of grades as incentives for students on the threshold between grade categories in a class. They hypothesized that, for example, a student on the borderline between a “C” and a “D” in a class would be more motivated to study for a final exam than a student solidly in the middle of the “C” range. However, these studies found only minimal ( Oettinger, 2002 ) or no ( Grant and Green, 2013 ) evidence that grades motivated students to perform better on final exams under these conditions.

This is not to say that classroom evaluation is by definition harmful or a thing to avoid. Evaluation of students in the service of learning—generally including a mechanism for feedback without grade assignment—can serve to enhance learning and motivation ( Butler and Nisan, 1986 ; Crooks, 1988 ; Kitchen et al. , 2006 ). Swinton (2010) additionally found that a grading system that explicitly rewarded effort in addition to rewarding knowledge stimulated student interest in improvement. This implies that balancing accuracy-based grading with providing meaningful feedback and awarding student effort could help avoid some of the negative consequences of grading.

Rather than motivating students to learn, grading appears to, in many ways, have quite the opposite effect. Perhaps at best, grading motivates high-achieving students to continue getting high grades—regardless of whether that goal also happens to overlap with learning. At worst, grading lowers interest in learning and enhances anxiety and extrinsic motivation, especially among those students who are struggling.

Grades as a Tool for Comparing Students—Is Grading on a Curve the Fairest Way to Grade?

You definitely compete for grades in engineering; whereas you earn grades in other disciplines … I have to get one point higher on the test than the next guy so I can get the higher grade. —Student quoted in Seymour and Hewitt (1997 , p. 118)

The concept of grading on a curve arose from studies in the early 20th century suggesting that levels of aptitude, for example as measured by IQ, were distributed in the population according to a normal curve. Some then argued, if a classroom included a representative sample from the population, grades in the class should similarly be distributed according to a normal curve ( Finkelstein, 1913 ). Conforming grades to a curve held the promise of addressing some of the problems surrounding grading by making the process more scientific and consistent across classrooms ( Meyer, 1908 ). Immediately, even some proponents of curved grading recognized problems with comparing levels of aptitude in the population with levels of classroom achievement among a population of students. For a variety of reasons, a given classroom might not include a representative sample from the general population. In addition, teachers often grade based on a student's performance or accomplishment in the classroom—characteristics that differ in many ways from aptitude ( Finkelstein, 1913 ). However, despite the reservations of some teachers and researchers, curved grading steadily gained acceptance throughout much of the 20th century ( Schneider and Hutt, 2013 ).

Grading on a curve is by definition a type of “norm-referenced” grading, meaning student work is graded based on comparisons with other students’ work ( Brookhart, 2004 , p. 72). One issue surrounding norm-referenced grading is that it can dissociate grades from any meaning in terms of content knowledge and learning. Bloom (1968) pointed out that, in grading on a curve “it matters not that the failures of one year performed at about the same level as the C students of another year. Nor does it matter that the A students of one school do about as well as the F students of another school.” As this example demonstrates, under curved grading, grades might not communicate any information whatsoever regarding a student's mastery of course knowledge or skills.

Of even more concern, however, is the impact norm-referenced grading has on competition between students. The quote at the start of this section describes how many students respond to curve-graded classes compared with classes that do not use a grading curve. Seymour and Hewitt (1997 , p. 118) explain, “Curve-grading forces students to compete with each other, whether they want to or not, because it exaggerates very fine degrees of differences in performance. Where there is little or no difference in work standards, it encourages a struggle to create it.” Studies have shown that science students in competitive class environments do not learn or retain information as well as students in cooperative class environments ( Humphreys et al. , 1982 ). Students in cooperative environments are additionally more interested in learning and find learning more worthwhile than students in competitive environments ( Humphreys et al. , 1982 ). Of particular concern is that the competitive environment fostered by norm-referenced grading represents one of the factors contributing to the loss of qualified, talented, and often underrepresented college students from science fields ( Seymour and Hewitt, 1997 ; Tobias, 1990 ). Disturbingly, even when a science instructor does not grade on a curve, students might, due to their past experiences, assume a curve is used and adopt a competitive stance anyway ( Tobias, 1990 , p. 23).

Bloom (1968 , 1976 ) presents evidence and a theoretical framework supporting an alternate view of grading whereby most students would be expected to excel and not fall into the middle grades. He states, “If the students are normally distributed with respect to aptitude, but the kind and quality of instruction and the amount of time available for learning are made appropriate to the characteristics and needs of each student, the majority of students may be expected to achieve mastery of the subject. And, the relationship between aptitude and achievement should approach zero” ( Bloom, 1968 ). In other words, even if we were to accept a concept of innate aptitude that is normally distributed in a classroom, that distribution should not predict classroom achievement, provided the class environment supports diverse learners in appropriate ways. This idea was a significant development, because it freed teachers from the stigma associated with awarding a larger number of high grades. Previously, an excess of higher grades was thought to arise only from either cheating by students or poor grading practices by teachers ( Meyer, 1908 ). Bloom's model argues that, when given the proper learning environment and compared against standards of mastery in a field (rather than against one another), large numbers of students could succeed. This type of grading—where instructional goals form the basis of comparison—is called “criterion-referenced” grading ( Brookhart, 2004 , p. 72).

Of course, Bloom's work did not rule out the possibility that some teachers might still give high grades for undesirable reasons unrelated to standards of mastery (e.g., to be nice, to gain the admiration of students, etc.). Such practices would not be in line with Bloom's work and would lead to pernicious grade inflation. Indeed, many of those bemoaning recent trends in grade inflation in higher education (though less prevalent in the sciences) point to the abandonment of curved grading as a major factor ( Rojstaczer and Healy, 2012 ). Such studies often promote various forms of curving—at the level of individual courses or even at the institution as a whole—to combat inflation ( Johnson, 2003 , chaps. 7–8). In light of the above, however, it seems strange to aspire to introduce grading systems that could further push students into competition and give rise to grades that indicate little about the mastery of knowledge or skills in a subject. The broader distribution of grades under curve-adjusted grading could simply create the illusion of legitimacy in the grading system without any direct connection between grades and achievement of learning goals. Perhaps the more productive route is to push for stronger, criterion-referenced grading systems in which instructional goals, assessments, and course work are more intimately aligned.

In brief, curved grading creates a competitive classroom environment, alienates certain groups of talented students, and often results in grades unrelated to content mastery. Curving is therefore not the fairest way to assign grades.

Grades as an Objective Evaluation of Student Knowledge—Do Grades Provide Reliable Information about Student Learning?

Study Critiques Schools over Subjective Grading: An Education Expert Calls for Greater Consistency in Evaluating Students' Work. — Los Angeles Times (2009)

As evidenced by the above headline, some have criticized grading as subjective and inconsistent, meaning that the same student could receive drastically different grades for the same work, depending on who is grading the work and when it is graded. The literature indeed indicates that some forms of assessment lend themselves to greater levels of grading subjectivity than others.

Scoring multiple-choice assessments does not generally require the use of professional judgment from one paper to the next, so instructors should be able to score such assessments objectively ( Wainer and Thissen, 1993 ; Anderson, 2008 , p. 451). However, despite their advantages in terms of objective grading, studies have raised concerns regarding the blanket use of multiple-choice assessments. Problems with such assessments range from their potential to falsely indicate student understanding to the possibilities that they hamper critical thinking and exhibit bias against certain groups of students ( Towns and Robinson, 1993 ; Scouller, 1998 ; Rogers and Harley, 1999 ; Paxton, 2000 ; Dufresne et al. , 2002 ; Zimmerman and Williams, 2003 ; Stanger-Hall, 2012 ).

Grading student writing, whether in essays, reports, or constructed-response test items, opens up greater opportunities for subjectivity. Shortly after the rise in popularity of percentage-based grading systems in the early 1900s, researchers began examining teacher consistency in marking written work by students. Starch and Elliott (1912) asked 142 teachers to grade the same English paper and found that grades on the paper varied from 50 to 98% between teachers. Because different teachers awarded scores ranging from failing to exceptional, the researchers concluded “the promotion or retardation of a pupil depends to a considerable extent upon the subjective estimate of his teacher” rather than upon the actual work produced by the student ( Starch and Elliott, 1912 ). Even greater levels of inconsistency were found in teachers’ scoring of a geometry paper showing the solution to a problem ( Starch and Elliott, 1913 ).

Eells (1930) investigated the consistency of individual teachers’ grading by asking 61 teachers to grade the same history and geography papers twice—the second time 11 wk after the first. He concluded that “variability of grading is about as great in the same individual as in groups of different individuals” and that, after analysis of reliability coefficients, assignment of scores amounted to “little better than sheer guesses” ( Eells, 1930 ). Similar problems in marking reliability have been observed in higher education environments, although the degree of reliability varies dramatically, likely due to differences in instructor training, assessment type, grading system, and specific topic assessed ( Meadows and Billington, 2005 , pp. 18–20). Factors that occasionally influence an instructor's scoring of written work include the penmanship of the author ( Bull and Stevens, 1979 ), sex of the author ( Spear, 1984 ), ethnicity of the author ( Fajardo, 1985 ), level of experience of the instructor ( Weigle, 1999 ), order in which the papers are reviewed ( Farrell and Gilbert, 1960 ; Spear, 1996 ), and even the attractiveness of the author ( Bull and Stevens, 1979 ).

Designing and using rubrics to grade assignments or tests can reduce inconsistencies and make grading written work more objective. Sharing the rubrics with students can have the added benefit of enhancing learning by allowing for feedback and self-assessment ( Jonsson and Svingby, 2007 ; Reddy and Andrade, 2010 ). Consistency in grading tests can also be improved by writing longer tests with more narrowly focused questions, but this would tend to limit the types of questions that could appear on an exam ( Meadows and Billington, 2005 ).

In summary, grades often fail to provide reliable information about student learning. Grades awarded can be inconsistent both for a single instructor and among different instructors for reasons that have little to do with a students’ content knowledge or learning advances. Even multiple-choice tests, which can be graded with great consistency, have the potential to provide misleading information on student knowledge.

GRADING—STRATEGIES FOR CHANGE

In part, grading practices in higher education have been driven by educational goals such as providing feedback to students, motivating students, comparing students, and measuring learning. However, much of the research literature on grading reviewed above suggests that these goals are often not being achieved with our current grading practices. Additionally, the expectations, time, and stress associated with grading may be distracting instructors from integrating other pedagogical practices that could create a more positive and effective classroom environment for learning. Below we explore several changes in approaching grading that could assist instructors in minimizing its negative influences. Kitchen et al. (2006) additionally provide an example of a high-enrollment college biology class that was redesigned to “maximize feedback and minimize the impact of grades.”

Balancing Accuracy-Based Grading with Effort-Based Grading

Multiple research studies described above suggest that the evaluative aspect of grading may distract students from a focus on learning. While evaluation will no doubt always be key in determining course grades, the entirety of students’ grades need not be based primarily on work that rewards only correct answers, such as exams and quizzes. Importantly, constructing a grading system that rewards students for participation and effort has been shown to stimulate student interest in improvement ( Swinton, 2010 ). One strategy for focusing students on the importance of effort and practice in learning is to provide students opportunities to earn credit in a course for simply doing the work, completing assigned tasks, and engaging with the material. Assessing effort and participation can happen in a variety of ways ( Bean and Peterson, 1998 ; Rocca, 2010 ). In college biology courses, clicker questions graded on participation and not correctness of responses is one strategy. Additionally, instructors can have students turn in minute papers in response to a question posed in class and reward this effort based on submission and not scientific accuracy. Perhaps most importantly, biology instructors can assign out-of-class work—case studies, concept maps, and other written assignments—that can promote student practice and focus students’ attention on key ideas, while not creating more grading work for the instructor. Those out-of-class assignments can be graded quickly (and not for accuracy) based on a simple rubric that checks whether students turned the work in on time, wrote the required minimum number of words, posed the required number of questions, and/or included a prescribed number of references. In summary, one strategy for changing grading is to balance accuracy-based grading with the awarding of some proportion of the grade based on student effort and participation. Changing grading in this way has the potential to promote student practice, incentivize in-class participation, and avoid some of the documented negative consequences of grading.

Providing Opportunities for Meaningful Feedback through Self and Peer Evaluation

Instructors often perceive grading to be a separate process from teaching and learning, yet well-crafted opportunities for evaluation can be effective tools for changing students’ ideas about biology. Nicol and Macfarlane-Dick (2006) argue that, just as teaching strategies are shifting away from an instructor-centered, transmissionist approach to a more collaborative approach between instructor and students, so too should classroom feedback and grading. Because feedback traditionally has been given by the instructor and transmitted to students, Nicol and Macfarlane-Dick argue that students have been deprived of opportunities to become self-regulated learners who can detect their own errors in thinking. They advocate for incorporating techniques such as self-reflection and student dialogue into the assessment process. This, they hypothesize, would create feedback that is relevant to and understood by students and would release faculty members from some of the burden of writing descriptive feedback on student submissions. Additionally, peer review and grading practices can be the basis of in-class active-learning exercises, guided by an instructor-developed rubric. For example, students may be assigned out of class homework to construct a diagram of the flow of a carbon atom from a dead body to a coyote ( Ebert-May et al. , 2003 ). With the development of a simple rubric, students can self- or peer-evaluate these diagrams during the next class activity to check for the inclusion of key processes, as determined by the instructor. The use of in-class peer evaluation thus allows students to see other examples of biological thinking beyond their own and that of the instructor. In addition, self-evaluation of one's own work using the instructor's rubric can build metacognitive skills in assessing one's own confusions and making self-corrections. Such evaluations need not take much time, and they have the potential to provide feedback that is meaningful and integrated into the learning process. In summary, both self- and peer-evaluation of work are avenues for providing meaningful feedback without formal grading on correctness that can positively influence students' learning ( Sadler and Good, 2006 ; Freeman et al. , 2007 ; Freeman and Parks, 2010 ).

Making the Move Away from Curving

As documented in the research literature, the practice of grade curving has had unfortunate and often unintended consequences for the culture of undergraduate science classrooms, pitting students against one another as opposed to creating a collaborative learning community ( Tobias, 1990 ; Seymour and Hewitt, 1997 ). As such, one simple adjustment to grading would be to abandon grading on a curve. Because the practice of curving is often assumed by students to be practiced in science courses, a move away from curving would likely necessitate explicit and repeated communication with students to convey that they are competing only against themselves and not one another. Moving away from curving sets the expectation that all students have the opportunity to achieve the highest possible grade. Perhaps most importantly, a move away from curving practices in grading may remove a key remaining impediment to building a learning community in which students are expected to rely on and support one another in the learning process. In some instances, instructors may feel the need to use a curve when a large proportion of students perform poorly on a quiz or exam. However, an alternative approach would be to identify why students performed poorly and address this more specifically. For example, if the wording of an exam question was confusing for large numbers of students, then curving would not seem to be an appropriate response. Rather, excluding that question from analysis and in computing the exam grade would appear to be a more fair approach than curving. Additionally, if large numbers of students performed poorly on particular exam questions, providing opportunities for students to revisit, revise, and resubmit those answers for some credit would likely achieve the goal of not having large numbers of students fail. This would maintain the criterion-referenced grading system and additionally promote learning of the material that was not originally mastered. In summary, abandoning curving practices in undergraduate biology courses and explicitly conveying this to students could promote greater classroom community and student collaboration, while reducing well-documented negative consequences of this grading practice ( Humphreys et al. , 1982 ).

Becoming Skeptical about What Grades Mean

The research literature raises significant questions about what grades really measure. However, it is likely that grades will continue to be the currency of formal teaching and learning in most higher education settings for the near future. As such, perhaps the most important consideration for instructors about grading is to simply be skeptical about what grades mean. Some instructors will refuse to write letters of recommendation for students who have not achieved grades in a particular range in their course. Yet, if grades are not a reliable reflection of learning and reflect other factors—including language proficiency, cultural background, or skills in test taking—this would seem a deeply biased practice. One practical strategy for making grading more equitable is to grade student work anonymously when possible, just as one would score assays in the laboratory blind to the treatment of the sample. The use of rubrics can also help remove bias from grading ( Allen and Tanner, 2006 ) by increasing grading consistency. Perhaps most importantly, sharing grading rubrics with students can support them in identifying where their thinking has gone wrong and promote learning ( Jonsson and Svingby, 2007 ; Reddy and Andrade, 2010 ). Much is yet to be understood about what influences students’ performance in the context of formal education, and some have suggested grades may be more of a reflection of a students’ ability to understand and play the game of school than anything to do with learning ( Towns and Robinson, 1993 ; Scouller, 1998 ; Stanger-Hall, 2012 ). In summary, using tools such as rubrics and blind scoring in grading can decrease the variability and bias in grading student work. Additionally, remembering that grades are likely an inaccurate reflection of student learning can decrease assumptions instructors make about students.

IN CONCLUSION—TEACHING MORE BY GRADING LESS (OR DIFFERENTLY)

A review of the history and research on grading practices may appear to present a bleak outlook on the process of grading and its impacts on learning. However, underlying the less encouraging news about grades are numerous opportunities for faculty members to make assessment and evaluation more productive, better aligned with student learning, and less burdensome for faculty and students. Notably, many of the practices advocated in the literature would appear to involve faculty members spending less time grading. The time and energy spent on grading has been often pinpointed as a key barrier to instructors becoming more innovative in their teaching. In some cases, the demands of grading require so much instructor attention, little time remains for reflection on the structure of a course or for aspirations of pedagogical improvement. Additionally, some instructors are hesitant to develop active-learning activities—as either in-class activities or homework assignments—for fear of the onslaught of grading resulting from these new activities. However, just because students generate work does not mean instructors need to grade that work for accuracy. In fact, we have presented evidence that accuracy-based grading may, in fact, demotivate students and impede learning. Additionally, the time-consuming process of instructors marking papers and leaving comments may achieve no gain, if comments are rarely read by students. One wonders how much more student learning might occur if instructors’ time spent grading was used in different ways. What if instructors spent more time planning in-class discussions of homework and simply assigned a small number of earned points to students for completing the work? What if students themselves used rubrics to examine their peers’ efforts and evaluate their own work, instead of instructors spending hours and hours commenting on papers? What if students viewed their peers as resources and collaborators, as opposed to competitors in courses that employ grade curving? Implementing small changes like those described above might allow instructors to promote more student learning by grading less or at least differently than they have before.

  • Allen D, Tanner K. Rubrics: tools for making learning goals and evaluation criteria explicit for both teachers and learners. Cell Biol Educ. 2006; 5 :197–203. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Anderson VJ. In: Encyclopedia of Educational Psychology. Thousand Oaks, CA: Sage; 2008. Grading. [ Google Scholar ]
  • Bagg LH. Four Years at Yale. New Haven, CT: Charles C. Chatfield; 1871. [ Google Scholar ]
  • Bean JC, Peterson D. Grading classroom participation. New Direct Teach Learn. 1998; 1998 (74):33–40. [ Google Scholar ]
  • Bloom BS. Learning for Mastery. Instruction and Curriculum. Regional Education Laboratory for the Carolinas and Virginia, Topical Papers and Reprints, Number 1. Eval Comment 1 (2), 1–11. 1968 [ Google Scholar ]
  • Bloom BS. Human Characteristics and School Learning. New York: McGraw-Hill; 1976. [ Google Scholar ]
  • Brookhart S. Grading. Upper Saddle River, NJ: Pearson Education; 2004. [ Google Scholar ]
  • Brookhart SM. How to Give Effective Feedback to Your Students. Alexandria, VA: Association for Supervision and Curriculum Development; 2008. [ Google Scholar ]
  • Brown University. 2014. Brown's Grading System. http://brown.edu/campus-life/support/careerlab/employers/employer-resources/browns-grading-system/browns-grading-system (accessed 19 February 2014) [ Google Scholar ]
  • Bull R, Stevens J. The effects of attractiveness of writer and penmanship on essay grades. J Occup Psychol. 1979; 52 :53–59. [ Google Scholar ]
  • Butler R. Enhancing and undermining intrinsic motivation: the effects of task-involving and ego-involving evaluation on interest and performance. Br J Educ Psychol. 1988; 58 :1–14. [ Google Scholar ]
  • Butler R, Nisan M. Effects of no feedback, task-related comments, and grades on intrinsic motivation and performance. J Educ Psychol. 1986; 78 :210. [ Google Scholar ]
  • Crisp BR. Is it worth the effort? How feedback influences students’ subsequent submission of assessable work. Assess Eval High Educ. 2007; 32 :571–581. [ Google Scholar ]
  • Crooks TJ. The impact of classroom evaluation practices on students. Rev Educ Res. 1988; 58 :438–481. [ Google Scholar ]
  • Cureton LW. The history of grading practices. NCME Measurement in Educ. 1971; 2 (4):1–8. [ Google Scholar ]
  • Dufresne RJ, Leonard WJ, Gerace WJ. Making sense of students’ answers to multiple-choice questions. Phys Teach. 2002; 40 :174–180. [ Google Scholar ]
  • Ebert-May D, Batzli J, Lim H. Disciplinary research strategies for assessment of learning. Bioscience. 2003; 53 :1221–1228. [ Google Scholar ]
  • Eells WC. Reliability of repeated grading of essay type examinations. J Educ Psychol. 1930; 21 :48. [ Google Scholar ]
  • Fajardo DM. Author race, essay quality, and reverse discrimination. J Appl Social Psychol. 1985; 15 :255–268. [ Google Scholar ]
  • Farrell MJ, Gilbert N. A type of bias in marking examination scripts. Br J Educ Psychol. 1960; 30 :47–52. [ Google Scholar ]
  • Finkelstein IE. The Marking System in Theory and Practice. Baltimore: Warwick & York; 1913. [ Google Scholar ]
  • Freeman S, et al. Prescribed active learning increases performance in introductory biology. Cell Biol Educ. 2007; 6 :132–139. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Freeman S, Parks JW. How accurate is peer grading? CBE Life Sci Educ. 2010; 9 :482–488. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Grant D, Green WB. Grades as incentives. Empirical Econom. 2013; 44 :1563–1592. [ Google Scholar ]
  • Guskey TR. Making the grade: what benefits student. Educ Leadership. 1994; 52 (2):14–20. [ Google Scholar ]
  • Harter S. Pleasure derived from challenge and the effects of receiving grades on children's difficulty level choices. Child Dev. 1978; 49 :788–799. [ Google Scholar ]
  • Harvard University. Cambridge, UK: E. W. Metcalf; 1832. Annual Report of the President of Harvard University to the Overseers on the State of the University for the Academic Year 1830–1831. [ Google Scholar ]
  • Higgins R, Hartley P, Skelton A. The conscientious consumer: reconsidering the role of assessment feedback in student learning. Stud High Educ. 2002; 27 :53–64. [ Google Scholar ]
  • Humphreys B, Johnson RT, Johnson DW. Effects of cooperative, competitive, and individualistic learning on students’ achievement in science class. J Res Sci Teach. 1982; 19 :351–356. [ Google Scholar ]
  • Jaschik S. Imagining College without Grades. (2009). www.insidehighered.com/news/2009/01/22/grades (accessed 20 February 2014)
  • Johnson V. Grade Inflation: A Crisis in College Education. Secaucus, NJ: Springer; 2003. [ Google Scholar ]
  • Jonsson A, Svingby G. The use of scoring rubrics: reliability, validity and educational consequences. Educ Res Rev. 2007; 2 :130–144. [ Google Scholar ]
  • Kitchen E, King SH, Robison DF, Sudweeks RR, Bradshaw WS, Bell JD. Rethinking exams and letter grades: how much can teachers delegate to students? CBE Life Sci Educ. 2006; 5 :270–280. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kohn A. Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes. New York: Houghton Mifflin Harcourt; 1999. [ Google Scholar ]
  • Los Angeles Times. Los Angeles Times, October 4, 2009. 2009. Study critiques schools over subjective grading. http://articles.latimes.com/2009/oct/04/nation/na-grading-policy4 (accessed 15 April 2014) [ Google Scholar ]
  • MacDonald RB. Developmental students’ processing of teacher feedback in composition instruction. Rev Res Dev Educ 8 (5), 1–5. 1991 [ Google Scholar ]
  • Marble WO, Winne PH, Martin JF. Science achievement as a function of method and schedule of grading. J Res Sci Teach. 1978; 15 :433–440. [ Google Scholar ]
  • Meadows M, Billington L. Unpublished AQA report produced for the National Assessment Agency. 2005. A review of the literature on marking reliability. http://archive.teachfind.com/qcda/orderline.qcda.gov.uk/gempdf/184962531X/QCDA104983_review_of_the_literature_on_marking_reliability.pdf . [ Google Scholar ]
  • Meyer M. The grading of students. Science. 1908; 28 :243–250. [ PubMed ] [ Google Scholar ]
  • National Education Association. Reporting pupil progress to parents. Res Bulletin. 1971; 49 :81–83. (October) [ Google Scholar ]
  • Nicol DJ, Macfarlane-Dick D. Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Stud High Educ. 2006; 31 :199–218. [ Google Scholar ]
  • Oettinger GS. The effect of nonlinear incentives on performance: evidence from “Econ 101.” Rev Econ Stat. 2002; 84 :509–517. [ Google Scholar ]
  • Palmer B. E Is for Fail. (2010). www.slate.com/articles/news_and_politics/explainer/2010/08/e_is_for_fail.html (accessed 19 February 2014)
  • Paxton M. A linguistic perspective on multiple choice questioning. Assess Eval High Educ. 2000; 25 :109–119. [ Google Scholar ]
  • Pulfrey C, Buchs C, Butera F. Why grades engender performance-avoidance goals: the mediating role of autonomous motivation. J Educ Psychol. 2011; 103 :683. [ Google Scholar ]
  • Reddy YM, Andrade H. A review of rubric use in higher education. Assess Eval High Educ. 2010; 35 :435–448. [ Google Scholar ]
  • Rocca KA. Student participation in the college classroom: an extended multidisciplinary literature review. Commun Educ. 2010; 59 :185–213. [ Google Scholar ]
  • Rogers WT, Harley D. An empirical comparison of three-and four-choice items and tests: susceptibility to testwiseness and internal consistency reliability. Educ Psychol Meas. 1999; 59 :234–247. [ Google Scholar ]
  • Rojstaczer S, Healy C. Where A is ordinary: the evolution of American college and university grading, 1940–2009. Teachers College Rec. 2012; 114 (7):1–23. [ Google Scholar ]
  • Sadler PM, Good E. The impact of self-and peer-grading on student learning. Educ Assess. 2006; 11 :1–31. [ Google Scholar ]
  • Schneider J, Hutt E. Making the grade: a history of the A–F marking scheme. J Curric Stud. 2013:1–24. [ Google Scholar ]
  • Scouller K. The influence of assessment method on students’ learning approaches: multiple choice question examination versus assignment essay. High Educ. 1998; 35 :453–472. [ Google Scholar ]
  • Seymour E, Hewitt N. Talking about Leaving: Why Undergraduates Leave the Sciences. Boulder, CO: Westview; 1997. [ Google Scholar ]
  • Sinclair HK, Cleland JA. Undergraduate medical students: who seeks formative feedback? Med Educ. 2007; 41 :580–582. [ PubMed ] [ Google Scholar ]
  • Smallwood ML. An Historical Study of Examinations and Grading Systems in Early American Universities: A Critical Study of the Original Records of Harvard, William and Mary, Yale, Mount Holyoke, and Michigan from Their Founding to 1900, vol. 24. Cambridge, MA: Harvard University Press; 1935. [ Google Scholar ]
  • Spear M. The influence of halo effects upon teachers’ assessments of written work. Res Educ. 1996; 1996 (56):85–86. [ Google Scholar ]
  • Spear MG. The biasing influence of pupil sex in a science marking exercise. Res Sci Technol Educ. 1984; 2 :55–60. [ Google Scholar ]
  • Stanger-Hall KF. Multiple-choice exams: an obstacle for higher-level thinking in introductory science classes. CBE Life Sci Educ. 2012; 11 :294–306. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Starch D. Reliability and distribution of grades. Science. 1913; 38 :630–636. [ PubMed ] [ Google Scholar ]
  • Starch D, Elliott EC. Reliability of the grading of high-school work in English. School Rev. 1912; 20 :442–457. [ Google Scholar ]
  • Starch D, Elliott EC. Reliability of grading work in mathematics. School Rev. 1913; 21 :254–259. [ Google Scholar ]
  • Stiles E. The Literary Diary of Ezra Stiles … President of Yale College. New York: Scribner's; 1901. [ Google Scholar ]
  • Swinton OH. The effect of effort grading on learning. Econ Educ Rev. 2010; 29 :1176–1182. [ Google Scholar ]
  • Tobias S. They’re Not Dumb, They’re Different: Stalking the Second Tier. Tucson, AZ: Research Corporation; 1990. [ Google Scholar ]
  • Towns MH, Robinson WR. Student use of test-wiseness strategies in solving multiple-choice chemistry examinations. J Res Sci Teach. 1993; 30 :709–722. [ Google Scholar ]
  • Wainer H, Thissen D. Combining multiple-choice and constructed-response test scores: toward a Marxist theory of test construction. Appl Measure Educ. 1993; 6 :103–118. [ Google Scholar ]
  • Weaver MR. Do students value feedback? Student perceptions of tutors’ written responses. Assess Eval High Educ. 2006; 31 :379–394. [ Google Scholar ]
  • Weigle SC. Investigating rater/prompt interactions in writing assessment: quantitative and qualitative approaches. Assessing Writing. 1999; 6 :145–178. [ Google Scholar ]
  • Weld LD. A standard of interpretation of numerical grades. School Rev. 1917; 25 :412–421. [ Google Scholar ]
  • Yale University. 2013. Revised Report of the Ad Hoc Committee on Grading. http://yalecollege.yale.edu/sites/default/files/2_Report%20from%20Ad%20Hoc%20Committee%20on%20Grading%5B2%5D.pdf (accessed 15 April 2014) [ Google Scholar ]
  • Zimmerman DW, Williams RH. A new look at the influence of guessing on the reliability of multiple-choice tests. Appl Psychol Measure. 2003; 27 :357–371. [ Google Scholar ]

history of grading system in education

History of the letter grading system

This story originally appeared on StudySoup and was produced and distributed in partnership with Stacker Studio.

Formal education systems have been in place for thousands of years—from the earliest examples of China’s Xia dynasty schooling that began in 2070 B.C., to the robust, philosophically based education systems used by the ancient Greeks beginning in 500 B.C. While formal education has been around since what feels like the dawn of time, the grading system used to determine mastery of subject matter is a relatively new concept.

It wasn’t too terribly long ago that schools had no way to even evaluate how well students mastered or achieved in school. Earlier versions of the education system relied on many different markers to determine whether students succeeded in their studies. And, in many cases, no scoring systems or pass or fail markers were used at all. Students simply learned and moved on—provided they were privileged enough to be able to attend formal schooling in the first place.

That’s quite different from the systems used today. Most schools now use either a letter grading system from A to F to score students on their subject mastery or use a number, such as a 4.0 grading scale. These grades indicate not only whether students pass or fail, but also how well they’ve mastered the subject matter. Students have learned to judge themselves and others on their ability to grasp a concept based on these systems. Getting a good grade in a class or on a test means they’ve achieved something special—something worthy of pride.

But how did the education systems shift from having no grading systems in place to the formal letter grading systems used today? StudySoup used internet sources on the history of education to compile a list of 10 milestones that occurred with the letter grading system to lead to where it is today in the United States. These milestones start with the earliest assessments and grow into a letter grading system that is as complex and varied as the education system. Here’s what you should know about the history of the letter grading system.

history of grading system in education

500 B.C.: Assessments without letter grades

Archaeologists have found evidence that both a formal and informal education system existed in ancient Greece. Students who received formal educations attained them via a public school system or tutor, with the formal education system primarily reserved for males and nonslaves. But while the ancient Greek society had a publicly accessible education system, there were no letter or number grades used to complete student evaluations. Assessments were only used for formative learning, not for evaluating. This meant that student evaluations were nonexistent—as were any methods for tracking whether a student had mastered the subject matter at hand or required more education and training.

history of grading system in education

1646: Exit exams lead the way to more formal evaluations

Ancient Greece may not have required formal exams to evaluate students, but as of the mid-1600s in the United States, Harvard University did . Harvard—which has long been considered one of the most innovative and storied higher education systems—began to require exit exams for students as early as 1646. These exit exams were required for students to obtain degrees from the university, but there were no formal letter or number requirements at that point. The formal exit exam requirement helped to pave the way for a more official grading system to be put in place over time—both at Harvard and within other educational systems.

history of grading system in education

1785: Evidence of the first grading scale

Yale University was one of the first schools to attempt a formal evaluation system for students—and it started with the university’s president Ezra Stiles. As early as 1785, there is evidence via Stiles’ diary that the president was attempting to evaluate students who attended exams at the university. Stiles did so by using four different Latin-based ranks: “optimi,” “second optimi,” “inferiores,” and “pejores”; or quality, best, worse, and worst. These ranks, or grades, were used to divide students into different grading or mastery categories—much like letter and number grades are used today.

history of grading system in education

1813–1839: Rules, guidelines, and exam results documented

Just a few years later, Yale began to implement a grading system that is similar to the one used today. Building on Ezra Stiles’ earlier attempts to evaluate students, Yale began to keep track of student information in what was called the “Book of Averages.” This book was used to document the rules and guidelines for exams—as well as the students’ exam results. What was especially unique about this book is that it was also used to average each students’ marks—or grades—which is precisely what is done in schools and institutes of higher education today. Even more interesting is the fact that it was done, at least in part, on a 4-point scale , which most colleges and universities still use. That said, there was still no evidence of a letter-grading system, but this 4.0 scale quickly evolved over the next few decades.

history of grading system in education

1895: First pass or fail system uses scoring to evaluate students

Yale’s early attempts at a grading scale paved the way for other colleges and universities to begin experimenting with other ways to evaluate students. While some of these scales were used to group students into larger categories or classifications— Harvard used such a system in the late-1800s, for example—other colleges and universities used specific systems to evaluate students on an individual basis. One of the more notable systems emerged from the University of Michigan, which in 1895 began to implement an individual student evaluation system of passing or failing. This system used five different scores or marks to evaluate students: passed, incomplete, not passed, conditional, or absent. Similar systems are still used today, though it’s more common for schools to implement a formal letter or number grading system to evaluate individual student performance.

history of grading system in education

1897: First evidence of the letter grading system emerges

While schools began to implement formal student evaluation systems before 1897, the first real example of the letter-grade system emerged this year from Mount Holyoke College in Massachusetts. Unlike the current letter grading system, however, the Mount Holyoke scale was an A–E system, with no letter F grade in place. There are other differences in this letter grade scale, too. At Mount Holyoke, an A was awarded on a 5-point scale and represented grades of 95 to 100, while B and C grades were awarded on a 10-point scale. A letter grade of D was awarded to students who scored only a 75—nothing higher and nothing lower—and anything lower than a 75 was awarded an E, which was a failing grade.

history of grading system in education

1898: The letter grading system gets an overhaul

The A–E letter grading scale didn’t last long. One year after its implementation, Mount Holyoke administrators added the letter F to the grading scale— with the F standing for failure. The other letters were revised at this point as well to add more symmetry to the grading scale. With this new version of the letter grading system, each letter grade represented a scale of five points. The letter grade A stood for 95 to 100, the letter grade B stood for 90 to 94, the letter grade C stood for 85 to 89, the letter grade D stood for 80 to 84, and letter grade E represented scores from 75 to 79. Anything lower than a 75 was awarded a letter grade of F, which was a failing grade.

history of grading system in education

Early 20th century: Formal grading systems are widely adopted

It didn’t take long for the letter and number grading systems to be adopted into more parts of the educational system in the United States—but no standardized version was used across the board. By the early 20th century, most public schools had a formal grading system in place for individual students, which was essential due to major changes occurring with the system itself. Numerous laws were passed during this time that required students to attend school—which meant more students were in attendance at this time—and immigration increased the number of students attending school, too. As such, schools had to formalize ways to keep track of student records and student evaluations, and in turn, implemented standardized grading systems—both number and letter grades—to make it easier to grade students on a firm set of standard criteria.

history of grading system in education

1930: The letter E disappears from the grading scale

There is no clear date as to when the letter E first started being removed from the letter grading scale. That said, most colleges had stopped using this letter to grade students by the year 1930. According to numerous sources, colleges stopped using E as part of the grading scale because of concern over students thinking that the letter grade stood for excellent. The letter was removed despite F standing for failing or failure—and it has remained that way in the time since. 

history of grading system in education

1940s to present: A widespread adoption of letter grades

Just 10 years after colleges and universities stopped using the letter E as a grade, the grading system had been widely adopted across the nation. By the 1940s, the letter grading scale was the most commonly used grading system. This system was used in conjunction with the 4.0 scale and the number grading system—grades from 0 to 100—and had been implemented by elementary, middle, and high school public systems as well as colleges and universities. The system would continue to be revised over time and would eventually become more integrated with the number grading scale. The letter grading system continues to be used today—though it comes in many forms and variations, including curved grades and cohort grading, depending on the school system.

Trending Now

Best law & order svu episodes.

Actors Mariska Hargitay and Christopher Meloni filming on location for 'Law & Order: SVU' on the streets of Manhattan on Feb. 14, 2011 in New York City.

Best drama movies from the last decade

Actor Joaquin Phoenix puts makeup on his face in the 2019 movie 'Joker.'

Best black and white films of all time

Actors Gloria Swanson and William Holden on a couch in the movie 'Sunset Boulevard.'

Best sitcoms of all time

history of grading system in education

  • Our Mission

Collage illustration of recorded grades and grading scale with ripped edges

Why the 100-Point Grading Scale Is a Stacked Deck

What to do about a 100-point grading system with a troubling history—and inherent flaws that carry over into the present day.

Grades are a staple of American education, but they’re a fairly modern invention. The earliest formal grading emerged in 1785 when Yale University began stratifying grades into four groups: Optimi , second Optimi , Inferiores , and Perjores (roughly translating to best, second best, less good, and worse). However, these grades weren’t given in individual classes or subjects—they were assigned during senior year, as students were preparing to graduate. Rather than a measure of learning , grading in the U.S. began as a last-minute method for ranking .

It wasn’t until 1837, when Harvard began using a 100-point rubric, that the modern grading system began to take shape. At the time, the distribution of scores resembled a bell curve, with typical scores clustered around the average of 50 , and scores above 75 or below 25 existing as rare events, relegated to the tails of the distribution. The new grading model posed thorny problems: As schools proliferated, there was little consensus as to what a score of 50 meant, and while a 50 in an advanced class might indicate proficiency, a 50 in a remedial course might represent only the most basic level of understanding.

After decades of experimentation, K–12 schools in the U.S. began to shift to the A–F grading system, eschewing the bell curve in favor of a simplified, five-level hierarchy that was meant to take stock of an individual’s learning, irrespective of their peers. In a class of 25 students, there was no reason why 20 of them could not receive As—or Fs. While bias could certainly creep into assessments, there was consensus that an “A” grade was superior, while a “C” grade reflected average performance. At that critical juncture of our grading history, the 100-point and A–F grading systems were independent: The former was designed to rank students within a university setting, the latter to normalize academic marks in public school settings.

The detente didn’t last. As schools sought to standardize grading further, the two systems, along with the 4.0 scale—a newcomer that emerged out of Yale’s original Latin rankings—eventually “fused together,” according to a 2013 study . “This move was slow, of course—the product of a decentralized system with few formal coordination mechanisms,” the researchers explain. But as the grading systems cycled through “mutations and resistance,” the 100-point scale wrapped itself around the other models and was pulled out of shape. The new average grade—in letters, a “C”—shifted and recentered around the 75-point mark instead of 50.

Downstream, the effects on students were mostly unanticipated.

A HISTORIC SKEW

The end result of that journey—the 100-point grading system in its current permutation—is a “badly lopsided scale that is heavily gamed against the student,” say the researchers James Carifio and Theodore Carey, who studied topics like cognitive psychology and assessment at the University of Massachusetts–Lowell. When the original 100-point scale prevailed, grades were centered around the midpoint, and a failing grade and a passing grade had equal weight. But when the grading systems merged and the centerpoint shifted upward, there was simply less area in which to succeed: Roughly 60 percent of the grading scale was now dedicated to failing marks, and the implications of a very low grade or a zero became catastrophic.

Consider the following scenario: A student gets an 82, 85, and 90 on their first three assignments—they’re a solid B student, with the potential to make a straight A. If they miss their next assignment, their average plummets to a 64. Even if they scored 90s on the next seven assignments—a clear plurality of outstanding work—they’d still end up with an 80, the equivalent of a B- or C+ in most grading systems.

In contrast, a B student receiving a zero 100 years ago would have merely dropped into the upper-40s range, an abbreviated setback that would still earn them a C, giving them ample latitude to recover.

THE ZERO ACCOUNTABILITY QUESTION

To compensate for the flaws of the 100-point grading scale, many districts now turn to minimum grading, automatically resetting zeroes to 50, for example. Critics of the approach say that no-zero policies fail to prepare kids for the real world and encourage students to coast and wait for opportune moments to buckle down. Students will inevitably put in minimal effort, the argument goes, when they know there’s a safety net and a chance to rebound in the future.

But Carifio and Carey found the opposite to be true. In a comprehensive 2015 study , they analyzed seven years’ worth of data for more than 29,000 high school students, looking at the impact that minimum grading had on test scores, grade inflation, and graduation rates. Compared with their counterparts in schools with traditional grading schemes, students who benefited from minimum grading actually put more effort into their learning, earning higher scores on state exams and graduating at higher rates.

In fact, for many students, according to the researchers, receiving a zero was demoralizing—not corrective. “The assigning of even a small number of catastrophically low grades, especially early in the marking term, before student self-efficacy can be established, can create this sense of helplessness,” Carifio and Carey explain, putting students in an impossible situation and discouraging them for the rest of the grading period. Giving students a lifeline out of a ruinous situation keeps them engaged and motivated to do better, the research suggests.

The claim about real-life norms is also dubious. There are times when deadlines must be strictly enforced, but for the most part, employers are typically forgiving of extensions and late work, recognizing that “assigned deadlines can be stressfully tight, compromising output quality,” according to a 2022 study , which also found that 53 percent of workplace deadlines were flexible. In fact, “deadline estimates are often overly optimistic,” and adhering to them too stringently can dramatically increase burnout.

SHIFTING THE CONVERSATION

Aside from mandatory minimums, there are other options that address the historical error while providing clear consequences for consistently incomplete or unsatisfactory work. For example, teachers can drop a student’s lowest grade (or both the lowest and highest grades), offer students the opportunity to make up work with or without penalties, or tweak the minimum grading policy so that it only applies to one or two assignments. Standards-based grading , which uses a 1 to 4 scale to highlight specific areas of academic and social growth, is a bigger investment but remains a viable alternative to traditional grading; portfolios give students opportunities to reflect on their work.

An Illustration depicting students taking tests

.css-1sk4066:hover{background:#d1ecfa;} Allowing Test Retakes—Without Getting Gamed

That’s not to say that any alternatives to the 100-point grading system are perfect. Perhaps the problem lies in grading itself, since students tend to fixate on their scores and not their learning goals. In a 2021 study , for example, researchers discovered that deprioritizing an assignment’s grade—by giving students feedback on an assignment a few days before their grade—increased performance on future assignments by two-thirds of a letter grade. “Research has shown that an excessive focus on grades can interfere with the student’s ability to self-assess—a crucial cognitive process in the feedback loop,” the researchers explain.

Middle school math teacher Crystal Frommert saw a similar pattern with her students, noticing that they were obsessed with grades , often at the cost of learning. As a minor act of resistance, Frommer doesn’t hand back a test with a grade—instead, she provides achievable feedback and then asks her students to reflect and make corrections. She’s careful to provide notes on how well they’ve learned the material but never discusses points (which are still logged and submitted to the school).

“This irritated the kids at first, but over time they began to focus on their actual performance,” writes Frommert. For students who have a burning desire to know their grade, however, she’ll schedule a conference the next day to help ensure a more productive, fruitful conversation.

Redefining education one letter at a time.

a history of grading in the US

The History of Grading in the US – What You Need to Know

 In my quest to understand how I could make my grades have more meaning and be truly representative of my students’ success I had to ask – how did we get here? What does a grade “A” represent? Why does any number from 0-59 mean an F, but only 80 – 89 equal a B? The only answer I could come up with was “That’s the way it’s always been done.” Any rational human knows that is not usually a good enough reason to do something. So, if I couldn’t explain WHY I graded on an A-F scale, how could I justify ANY of the grades I gave to my students? I started with the book Grading for Equity, by Joe Feldman. That led me to some journal articles, which led me to dedicate an entire summer to researching this question. As it turns out, the way we grade is not based in any kind of research. In fact, there is strong research that suggests the way we traditionally assign a grade to students is harmful and defeats the purpose of learning. Let me tell you all about the history of grading in the United States.

this book dips its toe into the history of grading in the US

What is a grade? Where We Are Now

Ideally, teachers assign grades to students in order to have an accessible summary of a student’s academic achievement ( through GPA, credits earned, etc.) for college and career applications ( higher grade = better candidate). In reality, grades actually do much more:

  • Summarize student tardiness, ability to turn in work on time, and participate in class
  • Summarize a student’s ability to collect points and play the game of school
  • Identify students who will show up and do any work as long as there’s a grade involved
  • Allow teachers to exert control within the classroom ( grades are a reward/ punishment for compliance)
  • Are the tiny but powerful gatekeepers that have a significant impact on a student’s future

So how did we get here?

There was a time when students learned for the sake of learning, because of a desire to know more, do more, be more. It’s why I became a teacher. But the current grading system has turned education into a game of point collecting, offering bonuses to students who have parents at home to remind them to finish their homework, full bellies, a safe place to sleep, and access to technology. You are almost guaranteed to win if you have lots of natural ability, but can at best hope for a mediocre score if you have to really work hard to master the skills. It gives special shortcuts students whose families and cultures value compliance, students who speak English as a first language, and, let’s face it, who are white ( Chiekem 2015).

I keep thinking there has to be a reason, some kind of scientific justification for why we grade the way we do but our grading practices actually stem from tradition, not science or reason.

A Brief History of Grading in the United States

Pre- colonial united states.

The modern education system in the United States has its roots in Western- European schooling values. It’s important to acknowledge the education and grading norms of the Native American Nations that existed far before colonialism.

  • Gregory Cajete, a Tewa Native American educator, described  education among native nations as incorporating experiential learning, storytelling, dreaming, and tutoring (Cajete 1994).
  • Although each nation had differing values, beliefs, and skills to pass down to their youngsters, for the most part, “grading” was an evaluation determined by elders over time with no scale, numbers, or overall competition.
  • The community, not just designated “teachers,” were in charge of the many different facets of the education of their young people.
  • There was a constant loop of feedback between pupils and their teachers, and students had multiple opportunities to learn, practice, and master the values important to their communities ( Morgan 2009).

Colonial America/ Early United States

In the 1600’s and 1700’s, most children were schooled in buildings in each community, although the social elites of the colonies sent their students to the preparatory schools designed as feeder schools to Harvard, the only American college at the time. Curriculum and rigor varied widely, as did valuations of students’ academics ( although most reflected a Puritan belief system). Grades as we know them did not exist yet – students sat exams for entrance to colleges (for the few who chose to pursue higher learning) and students earned a degree only after exit exams that demonstrated their intellectual attainment – there were no letter grades or GPA involved.

Here’s what WAS involved:

  • There was a central focus on the relationship between a teacher, her pupils, and their parents.
  • Progress was communicated through oral reports on home visits (getting some strong Ichabod Crane vibes here).
  • Success of this system was based on the teacher being a trusted part of the community as well as teachers having a manageable load of students to provide meaningful specific formal and informal feedback.
  • Students sat oral exams for a committee on a regular basis as an official determination of their academic standing. Often, students were ranked and re-ranked in order to motivate them, but even then critics worried that this focused on immediate success and not genuine intellectual development( Schneider and Hutt 2014).  
  • Yale ranked its students into categories of intellectual achievement after examinations, but these were not necessarily known to students. Harvard used labels as well ( summa cum laude, magna cum laude, and cum laude) to designate special achievement .

Industrial Age

With the 1800’s came a wave of change to the United States and the foundations for the grading system in use today. The Industrial Revolution, city migration, and massive immigration numbers( not to mention child labor laws and compulsory education laws)  meant that far more students than ever before were crowding schools. The detailed, personalized accounts of student progress were no longer possible ( Schneider and Hutt 2014). 

  • Grammar schools began awarding percentage grades for individual courses and requiring a certain amount of “credit” to graduate.
  • Students also began to sit written exams regularly and report cards were sent home ( albeit still without letter grades) thanks to the recommendations of Horace Mann, an early advocate for education reform who believed that students should “shine” all the time, not just for the oral competitions and rankings ( Schneider and Hutt 2014).
  • Other factors were sometimes incorporated into grades, including timeliness and neatness – skills valued by factories in their workers. This is when we also see the standardization of the school day – everything based on factory norms.

1900’s to Modern Day

Many schools began using percentages to rank students on paper around the 1900’s. This was done in part so that teachers could assess higher numbers of students, but also because higher learning institutions now needed a succinct way to determine whether a prospective student would be academically successful at their school (Carifio and Carey 2013).

  • Something interesting to note about this percentage system, however, is that the average score of students was 50. Grades above 75 and below 25 were rare ( Smallwood 1935) and indicated truly exceptional academic and intellectual abilities.
  • The percentages were still not succinct enough for colleges, and so many schools created scales in which a word or letter was assigned to different ranges of scores.
  • Students quickly learned how to maximize the points they earned while minimizing effort ( I think every student, myself included, has done this for at least one class in their life).
  • Studies conducted during the time period noted that grades created high levels of anxiety and competition as well as encouraging cheating (Schneider and Hutt 2014). 
  • Due to tracking, the ladder of opportunity was often easier to access for white or wealthy students with resources to earn more points. Poor, immigrant, or students of color were often excluded from college tracks in schools, regardless of their actual intellectual skills because of this( Feldman 2018).

Where does this exploration of the history of grading leave us?

Through this research, it seems to me that the grades I have been assigning since I started teaching are wholly arbitrary. There is no scientific justification behind these systems, other than attempts to create a standardized grading summary for the purpose of college admittance. I am confident that today, if I asked 100 different high school educators how they arrive at each grade for their students, I would get 100 different answers. As honorable as our intentions may be, this is not equity, and it seems to me that our grades have become so far removed from reports on actual intellectual ability that they are meaningless.

Here are some of the problems I see:

  • The system favors one singular Euro-centric philosophy of education which is out of date and inherently biased. It ignores the rich and valuable belief systems and educational practices of a multitude of cultures, including the indigenous ones that predate our country. 
  • Grades are not being assigned as initially intended  – rather than most students receiving Cs, most students receive As and Bs, watering down true academic exceptionalism in order to avoid students losing out on college opportunities. Furthermore, the number of failing grades available (59) is far higher than the number of available passing grades.
  • This system was built to satisfy the needs and values of the Industrial- Revolution era United States – but we’ve evolved as a country and as humans since then. Our needs and values have evolved as well.
  • Grades favor students with access to resources, and penalizes students without.
  • Teachers, who are not immune to bias ( explicit or implicit), are ultimately in charge of determining a single letter that can impact a child’s entire academic future. Moreover, that letter in a classroom in Wisconsin or California could mean something completely different from the same letter in Louisiana or Massachusetts (let alone by two teachers in the same district or even school).

Where do we go from here?

Unless we can dismantle this deeply entrenched arbitrary grading system, we have to figure out how to make our grades reflect a student’s academic skills and not their ability to grab points on their way up the educational ladder. But that’s a story for another blog post.

Cajete, G. (1994). Look to the Mountain: An Ecology of Education (1st ed.). Kivaki Press.

Carifio, J., & Carey, T. (2013). The Arguments and Data in Favor of Minimum Grading. Mid-Western educational researcher, 25 , 19-30.

Chiekem, E. (2015). Grading Practice as Valid Measures of Academic Achievement of Secondary Schools Students for National Development. Journal of Education and Practice , 6 (26), 24- 28. ERIC. https://files.eric.ed.gov/fulltext/EJ1077389.pdf

Feldman, J. (2018). Grading for equity : What it is, why it matters, and how it can transform schools and classrooms.

Feldman, J. (2019, April). Beyond standards-based grading: Why equity must be part of grading reform. Phi Delta Kappan , 100 (8), 4. SAGE Journals. https://doi.org/10.1177/0031721719846890

Morgan, H. (2009). What Every Teacher Needs to Know to Teach Native American Students. Multicultural Education, 16(4). https://files.eric.ed.gov/fulltext/EJ858583.pdf

Schneider, J., & Hutt, E. (2014, May). Making the grade: a history of the A–F marking scheme. Journal of Curriculum Studies . Research Gate.http://dx.doi.org/10.1080/00220272.2013.790480

Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities: A critical study of the original records of Harvard, William and Mary, Yale, Mount Holyoke, and Michigan from their founding to 1900 . Cambridge: Harvard University Press.

Share this:

5 biased grading policies to ditch

Related Posts

grading in the classroom

Grading in the Classroom: It’s Time for Change

debunking myths about rigor

6 Myths about Rigor: It’s Not What You Think

the new must-read PD book for every teacher

The New Must-Read PD Book for EVERY Teacher

February 26, 2024

Education , History

The Origin of Grades in American Schools

You may not have heard of Ezra Stiles, but he’s the reason students today get grades – and the system hasn’t changed much in 250 years.

  • Share on Facebook (opens new window)
  • Share on Twitter (opens new window)
  • Share on Pinterest (opens new window)

history of grading system in education

Weekly Newsletter

The best of The Saturday Evening Post in your inbox!

Ezra Stiles, a former president of Yale University, died nearly 230 years ago , but his ghost continues to haunt schools and colleges across the country. In addition to his time leading one of the most well-known educational institutions in the world, Stiles was also a minister , a founder of Brown University , someone who bought enslaved people , and a pen-pal of both Thomas Jefferson and Benjamin Franklin , yet he is largely remembered for something he himself never would have anticipated: introducing grades as a means for evaluating academic work.

history of grading system in education

Not only was Stiles the president of Yale from 1778-1795, but he was also a faculty member, which required him to examine students on topics ranging from Greek and Latin to rhetoric and mathematics. At this point in Yale’s history, students answered exam questions orally , and they were evaluated publicly in front of their peers other members of the university community.

Stiles kept a meticulous diary, and in a series of entries for the year 1785, he details a rather frenetic week full of student examinations. From April 4-8 of that year, he did everything from writing questions to personally examining students. April 5 is of special significance for our story, though. On this day, Stiles examined 58 seniors. In his diary and other notes, he describes the way he categorized student performance on the examinations. Twenty out of the 58 were classified as Optimi , 16 as Second Optimi , 12 as Inferiores (Boni) , and a worrisome 10 as Pejores .

Subscribe and get unlimited access to our online magazine archive.

Three of these terms are fairly easy to translate. Optimi means “Best,” while Second Optimi unsurprisingly means “Second Best.” Pejores , the lowest performing group, can most accurately be rendered as “Worse” (not “Worst,” which indicates at least some empathy on the part of Stiles). Inferiores (Boni) , on the other hand, is a bit trickier. In the past, some have translated this phrase as “Less Good,” but “the Lower of the Good Group” is probably closer to the original meaning.

Although we may dither a bit about the exact meaning of these terms in English, there are a few points that are incontestable about Stiles’s four categories of achievement. First, they together comprise what is almost certainly the first documented grading system in the history of American education. Before you cheer or jeer him for this, it is important to know that Stiles did not invent this approach. Across the pond, examiners at Cambridge had been evaluating student performance on oral exams using similar kinds of Latin-labeled groupings for much of the eighteenth century. Stiles was simply an enthusiastic importer of this British system, but he can be credited with putting an American stamp on it. As a vocal supporter of independence and the Revolutionary War, which took place during his Yale presidency, I think this might have pleased him.

Second, the clear purpose of this mode of evaluation was to publicly rank and sort students according to their achievement, not to give them feedback on their learning or to suggest how they might improve before the next exam. This kind of class ranking was a mechanism for conveying status and privilege (or withholding them), oftentimes mirroring the social structures of the world beyond the ivy-covered university walls. In the end, Ezra Stiles bears a healthy dose of responsibility for popularizing a system in which public recognition and classification of performance mattered more than individual learning, student development, or the cultivation of curiosity and inquiry — all of which are ostensibly the primary goals of education.

history of grading system in education

This is grading’s original sin, and — despite centuries of change in American schools and colleges — we have never been able to move past it. From the very beginning, grading was more about sorting, signaling, and certification than it was a system for supporting or enhancing students and their learning. Class valedictorians, Latin honors (like summa cum laude ), and GPAs are all legacies of Ezra Stiles in this regard, but even more familiar grading models suffer the same fate.

Percentage grades and A-F letter grades, which arrived on the scene in the late 1800s , give slightly more information about a student’s performance in a class, but they still serve the purpose of ranking and sorting, and we also have good evidence to suggest that they are not accurate measurements of learning in the first place . That’s a problem, and students shoulder the burden.

We have now arrived at a pivotal moment for education in America. Armed with an understanding of the problems with traditional grading, teachers and administrators across the country are experimenting with new approaches like standards-based grading , contract grading , collaborative grading , and more. Their goal is to evaluate students in ways that better reflect learning, allow room for mistakes, and account for personal growth.

Those elements of education might not have mattered much to Ezra Stiles in the America of the 1780s, but they are absolutely essential for the future of our nation’s students.

Become a Saturday Evening Post member and enjoy unlimited access. Subscribe now

Recommended

history of grading system in education

Apr 23, 2024

Living Legacies of Service: Native American Women in the U.S. Military

history of grading system in education

Apr 22, 2024

Considering History , History

Considering History: 6 of the Best American Nature Books for Earth Day

Ben Railton

history of grading system in education

Apr 18, 2024

Culture , History

How Bicycles Changed Women’s Fashion and Feminism

Einav Rabinovitch-Fox

Standards are a measure we use to see where students are compared to a desirable ‘norm’. Those who want to do away with measures and standards prefer to be evaluated individually. This defeats the purpose of standards as there are may reasons why some do better than others on standardized tests but that is not a reason to do away with them. How we stand in relation to others is necessary to improve teaching methods and also to see who may be qualified or not, for a specific job or vacation. Perhaps instead of learning how to interpret test results there are many who are frightened of competition – grading of individuals and measuring methods of new inventions etc.. all want the same thing – to see what works and what doesn’t. Those who do not want to see where people or things stand in relation to ‘norms’ may not ever improve … … The only opportunity in today’s world to showcase ‘excellence’ is in the world of Sports… Excellence counts in sports – but not in academics? We need to stop worrying about hurt feelings of those who are fearful of standards because they don’t measure up – it ideally ought to spur them to work harder or be satisfied to not always rise to the top of the scale and to work with what talents they have…

It’s not so much that a grading system is bad, it’s how it is used. And that doesn’t even come close to how poorly the USA’s system of education is ‘performing’. Education shouldn’t be restricted to a timetable.

Thank you for reading and engaging with the piece, Bob. I appreciate you sharing your story too. I agree that change is hard—in education especially.

Ezra Stiles was definitely quite a complicated man when looked at in the big picture of things, where I dare say he would probably get more than a few D’s and F’s if given a report card on his own life, and what he put into motion.

Your links here are very much appreciated, Josh. I know I struggled with the A-F grading system myself. My kindergarten teacher (public school) told my Mom she felt I’d do better in the Catholic school for the 1st and 2nd grades in getting the basics down, then return to the public school for the 3rd grade on. So that’s what happened.

I did pretty well in the 1st and 2nd grades. getting C’s to A’s. My favorite part was learning cursive. Loved the Aa, Bb, Cc etc. Sister kept at the top above the green board. The Declaration of Independence signatures (wall copy) fascinated me to no end. I was already handwriting before we were actually taught. So when I’m interested in something, I’m all in to learn. When I’m not, my brain puts up a block of “we don’t like this” and tunes it out.

In the 3rd grade I was back in the public school, and about 6 weeks in was the parent-teacher conference at night. Mom woke me up, upset I’d gotten a ‘D’ in arithmetic, and Mrs. Wagner noticed I was looking outside at the entertaining birds more than paying attention to her. I would have appreciated the teacher taking me aside, and speaking to me first, but she didn’t. She might have had a praise report instead if she had.

10 pm Mom gives me the speech about how she, a college graduate and high school English teacher NEVER got a ‘D’ in any subject, nor my college and law school grad Dad either. How the pediatrician said I was advanced in the tests for infants and toddlers, fully talking at 1 and a 1/2; how could this be?! Try getting back to sleep after that.

I made every effort to improve, but my teacher decided to have me see the school psychiatrist for his determination on it, including an IQ test. The good news was I had a high IQ which made why I wasn’t a good student all the more frustrating.

ANYWAY, in the 4th grade, I had this incredible teacher that “got me”, and I had almost straight A’s and a couple of B’s. So the teacher and hers or his style and method, can make a huge difference, even within the confines of the present system!

After that, I forced myself to learn what my mind was rejecting to pass the damn tests (with cheating only as needed) ) and concentrate on creative writing, English lit, Greek and Roman Antiquity, being on the high school paper, writing record reviews (Carly Simon’s ‘Hotcakes’, Cat Steven’s ‘Buddha and the Chocolate Box’) and got my financial and communications degrees in college.

I actually put those two to good use professionally, but frankly my own natural aptitude should get most of the credit. Society puts too much emphasis in its narrow definitions of what education is or isn’t. Rush them straight into college out of high school by rote so they don’t have time to question anything. Whether their degree will get them anywhere doesn’t matter; just get them on the hook for owing endless thousands of dollars.

Grades are based on passing tests which can often mean a person is simply good at regurgitation of facts, but it doesn’t translate to making intelligent or morally right decisions. The people running this country are probably the most highly educated (on paper) we’ve ever had, but look where we are. That sentence says it all, thank you.

Education and grading need complete overhauls. The one-size-fits-all way has really never worked that well. We have so much over-choice on things we don’t need, yet in this area things are still stuck in 18th century. Unfortunately, keeping things as they are is still profitable for the educational industry, which it is. A profitable industry.

Your email address will not be published. Required fields are marked *

  • Visit the University of Nebraska–Lincoln
  • Apply to the University of Nebraska–Lincoln
  • Give to the University of Nebraska–Lincoln

Search Form

Alternative grading for college courses.

Traditional grading systems have many inherent issues that undermine learning. In this collection of web pages, the issues are explained and four alternative approaches - mastery, competency-based, contract, and specification - are described. If you would like assistance in thinking through how one of these alternative grading systems might work in your course, or how you might address any issues in your current grading approach, please contact an instructional designer assigned to your college .

Traditional Grading Systems

Traditional grading systems have a long history in colleges and universities across the United States. The first grading systems were used by Yale, Harvard, and other institutions as early as the 1880s and included bell-curve, 100-point (percentage), 4-point (GPA style), or letter (A-F) grading systems (Schinske and Tanner 2014, Bowen and Cooper 2022). In these institutions, grades were originally used to sort students for future career fields as grades allowed for a very subjective idea (learning) to be more easily quantified (Schwab et al. 2018).

Two main grading systems are used in traditional grading: Norm-referenced and criterion-referenced grading systems. Norm-referenced grading, also called normative grading or “grading on a curve”, uses the normal distribution (a bell-shaped curve) to rank student performance (Burton 2006). In norm-referenced grading students compete for a limited number of each grade as usually only the top 10% of students can earn an A in the course (Sadler 2005). This type of grading allows for students to be sorted or ranked within a course, but earned grades are usually not directly tied to student learning outcomes (Burton 2006). In most norm-referenced grading systems, assessments are mainly used for testing content knowledge on topics that have definitive answers (Burton 2006).

Criterion-referenced grading systems use a specific requirement to assess student work. The instructor selects a criterion or set of criteria, usually a point-based, percentage-based, or letter grading system, as criteria for grading (Sadler 2005). In most cases, the criteria used are not directly linked to the student learning objectives for the course (Sadler 2005). The criteria used in the course become the rules or requirements used to make judgments on student work and ultimately the overall course grade (Sadler 2005). Criterion-referenced grading is seen as more equitable than norm-referenced grading as student assessment is not based on rank or competition among students (Sadler 2005; see Box 1 for more information on the issues with normative grading practices).

Issues with Traditional Grading

Traditional grading systems have many inherent issues that undermine learning. Traditional grading systems are highly subjective and internalize instructor biases (Sadler 2005, Schwab et al. 2018, Link and Guskey 2019, Feldman 2019). Traditional grading systems pit students and instructors against each other by making grades a commodity that students must negotiate with the instructor (point grubbing), instead of building trusting relationships that allow for students to learn from their mistakes, take risks, and be creative (Feldman 2019). Additionally, these grading systems can increase stress and anxiety in students (Chamberlin et al. 2018) while reducing cooperative learning, critical thinking, creativity, and motivation (Strong et al. 2004, Chamberlin et al. 2018, Schwab et al. 2018, Feldman 2019).

At their core, grades are highly subjective due to a lack of consistency in what and how learning is measured (Sadler 2005, Schinske and Tanner 2014, Buckmiller et al. 2017, Scarlett 2018, Schwab et al. 2018, Link and Guskey 2019, Towsley and Schmid 2020, Zimmerman 2020). Instructors have different criteria for how students earn grades with many using a mixture of effort, achievement, and behaviors to assess student learning (Buckmiller et al. 2017, Schwab et al. 2018, Feldman 2019). When non-content aspects (attendance, participation, late penalties, extra credit, etc.) are included in grades, the grades become inaccurate for determining student learning (Buckmiller et al. 2017, Schwab et al. 2018, Feldman 2019, Link and Guskey 2019). Additionally, because each instructor determines the categories, weights, and other factors included in a final course grade, grades become unreliable indicators of a student’s understanding of the content (Link and Guskey 2019) and are not comparable across instructors or institutions (Schwab et al. 2018). For example, when researchers provided instructors with the same English essay, huge variations in grades were awarded to this single essay (50-98%) depending on the instructor (Schinske and Tanner 2014). The research was repeated with a single geometry paper with similar results indicating that this issue with grading is not restricted to more subjective topics or disciplines (Schinske and Tanner 2014). Further research also showed that if an instructor was asked to grade the same assignment at different times, the grades on this same assignment were significantly different at different times in the semester (Schinske and Tanner 2014). Thus, grades may lack validity due to sampling adequacy, question/prompt quality, marking standards, marking reliability, measurement error or similar issues making grades from traditional grading systems a poor indicator of student learning (Sadler 2005).

Due to the subjectivity of grades, many inequalities can become incorporated in a grade when traditional grading systems are used (Feldman 2019, Link and Guskey 2019). For example, if an instructor uses behaviors in grading (attendance, participation, etc.), then racial or gender biases may become part of the grade (Feldman 2019, Link and Guskey 2019). Research indicates that black students are seen as more disruptive than white students by white instructors (Link and Guskey 2019). Thus, if the behavior is incorporated into the grade, black students can receive lower grades even if they have a better understanding of the content than that grade would depict (Link and Guskey 2019). Similarly, gender biases around behavior can negatively impact men’s grades when grades are based on social skills and conforming behaviors (Link and Guskey 2019). In addition to race and gender, instructor bias can be seen based on the experience level of the instructor, the order the assignment was graded, the penmanship of the author, and other factors unrelated to the actual student work (Schinske and Tanner 2014).

Because grades are a form of extrinsic reward, traditional grading can reduce student motivation for learning (Schinske and Tanner 2014, Chamberlin et al. 2018, Schwab et al. 2018, Feldman 2019). When students are rewarded by points and grades instead of for their improvements in learning, they become reliant on extrinsic rewards (points/grades) over intrinsic desires to learn deeply about interesting topics (Chamberlin et al. 2018, Feldman 2019, Zimmerman 2020). Research has shown that students provided with grades and feedback on their work, ignore the feedback and focus solely on the grade (Schinske and Tanner 2014, Chamberlin et al. 2018). In a study where students were randomly placed in one treatment (given a grade only, feedback only, or a grade and feedback), students provided with only descriptive feedback (no grade) did better on problem-solving tasks than other students (Schinske and Tanner 2014). Further, students that received both descriptive feedback and a grade, did not improve their work either because they fail to read the descriptive feedback or failed to use it in future assessments (Schinske and Tanner 2014). Overall, grades do not increase student motivation or interest in learning (Schinske and Tanner 2014, Schwab et al. 2018) and can have negative impacts on student performance by increasing anxiety, fear of failure, and a fixed mindset (Schinske and Tanner 2014, Chamberlin et al. 2018).

Equitable Grading Practices

Knowing that traditional grading practices are problematic for providing an accurate representation of student learning, it is important to understand the aspects of grading practices that need to be changed to make grades more accurate and equitable. In “Grading for Equity”, Joe Feldman (2019) devised three main areas of grading that need to be addressed to increase equity: accuracy, bias-resistance, and student motivation.

Traditional grading practices rely on unsound use of mathematical calculations and techniques when assessing student learning. Grading systems that use zeros for missing work, 0-100 percentage scales, averaging of points, and similar techniques to calculate a student's grade are heavily weighted toward student failure and do not accurately measure student learning. When instructors use zeros for missing work and then average scores, the missing assignments substantially lower the student’s grade and do not accurately reflect the student's understanding of the content.

Moreover, using the percentage system for grading (90-100% = A; 80-89% = B; 70-79% = C; 60-69% = D; and 0-59% = F) creates a system where failure is mathematically more likely to occur than any other grade. Of the 101 units in the 0–100-point system, 60 units are failing and only 41 units are passing (assuming a D is a passing grade). Once a student has too low a grade, it becomes mathematically impossible for the student to recover and pass the course. Use of grading systems that have equal levels (each letter grade equal to 20% of the 100 percentage scale: 80-100% = A; 60-79% = B; 40-59% = C; 20-39% = D; and 0-19% = F) which occurs naturally when using a 4-point grading system (4 = A; 3 = B, 2 = C, 1 =D; 0 = F) over a 100-point system increases equity in grading. Alternatively, instructors can implement minimum grading (students automatically have 50% on any assignment in a 0–100-point system and therefore each grade is only 10% of the grading scale) or consider recent work more strongly than beginning assignments to reflect student learning and thus create a more equitable grading system.

Bias-Resistance

Everyone has implicit biases that influence how they view the world. In grading, implicit biases are often associated with including non-content aspects of a class into a student’s grade. When attendance, politeness, participation, effort, late penalties, and extra credit are incorporated into a grade, the accuracy of the grade is diminished as instructor biases are added to grades. What constitutes politeness? How much effort is worth an A? How many points are lost for late work even if the student shows they understand the content? Including non-content related methods (usually based on behaviors) for students to earn points, allows students with more resources and privileges to excel even if they don’t understand the content while less privileged students are negatively impacted even if they do know the course content. Thus, instructor biases can create inequitable grading.

Student Motivation

Motivation is a complex aspect of human psychology that includes initial interest, engagement, and persistence in a task. Two main types of motivation can influence an individual’s involvement and completion of a task: intrinsic and extrinsic motivation. Intrinsic motivation includes any internal mechanisms used by an individual to engage in a task while extrinsic motivation requires a reward (or punishment) be associated with the task. Much research on motivation indicates that extrinsic motivation can be used for short-term compliance on rote tasks, but fails to promote the completion of creative, imaginative, original, or complex tasks including deep learning. Grades are extrinsic motivation and have been shown to reduce student motivation to learn and promote a fixed (over a growth) mindset. Using alternative grading approaches can redirect student motivation to learning by de-emphasizing grades and allowing students multiple opportunities to meet standards associated with course content.

Alternatives to Traditional Grading

Standards-based grading is an alternative to traditional grading that uses specific standards (usually student learning objectives) to assess both content and skills-based knowledge (Burton 2006, Buckmiller et al. 2017, Scarlett 2018, Zimmerman 2020). There are six main principles for standards-based grading (Towsley and Schmid 2020):

  • Learning objectives and outcomes are explicitly stated and accessible to students.
  • Instructors use many strategies to facilitate student learning.
  • Students are given flexibility in the methods used to show mastery of learning objectives.
  • Assessments are designed to test students on learning objectives.
  • Course grades are based on students’ ability to demonstrate understanding of learning objectives only (not on behaviors or other non-academic criteria).
  • Strategies are implemented to provide students with additional supports to ensure learning.

In standards-based grading, students are provided detailed information prior to learning material that explicitly links learning objectives to assessments and states the requirements for assignments and assessments (Sadler 2005, Bonner 2016). Student work is then used as evidence for meeting the learning objectives for each assessment (Burton 2006, Buckmiller et al. 2017) with different methods used to provide students with opportunities to resubmit work that does not initially meet the learning objective standards (Scarlett 2018, Zimmerman 2020). Depending on the framework used for standards-based grading, different methods are used to assess coursework and determine course grades. Standards-based grading frameworks include mastery grading, competency-based grading, contract grading, and specifications grading. It is also important to know that different authors and researchers use some of these terms somewhat interchangeably or with the overarching title of “ungrading” or Competency-based Education.

How to Implement Standards-based Grading

Regardless of the framework used, all standards-based grading systems have four common traits incorporated into them (Towsley and Schmid 2020).

Determine the Learning Objectives for the Course

All types of standards-based grading start by using backward design to determine learning objectives and align the objectives to the assessments and assignments (Sadler 2005, Bonner 2016, Towsley and Schmid 2020). When determining the learning objectives for a course, it is important to include learning objectives that cover topics that will be needed for future courses as well as for standardized or other formal exams for the discipline (Lalley and Gentile 2009). Additionally, looking over past course exams can help to determine the topics and skills students will need to acquire before the end of the course. Depending on the course, the learning objectives may be broad (with less than 10 learning objectives in the course) or very specific (in which case the course may have 20 or more learning objectives; Cilli-Turner et al. 2020). Usually, courses that focus on skills and procedural knowledge (solving math problems using specific steps, using a microscope, writing a complete sentence, etc.) have more learning objectives than courses that require higher-level thinking and application of knowledge (writing out a math proof based on an underlying theory, explaining how life evolved from single-celled organisms to multicellular life, writing a persuasive essay; Cilli-Turner et al. 2020). Finally, learning objectives need to be written to be assessable, readable, and understandable by students (for more information on writing learning objectives, read the “ Designing your Course ” pages in the Teaching@UNL resource) as all forms of standard-based grading provide the learning objectives to students.

Define Mastery for Assessments

Another common trait for all standards-based grading is the use of pre-determined methods for assessing mastery on individual assignments. Mastery can be based on a percentage correct on an assignment, a specific number of correctly answered problems for each learning objective, or a minimum number of “passed” aspects of a rubric for an assignment. Thus, it is important to determine how mastery will be achieved for each type of assessment as this will dictate the types and number of assessments students need to complete (and to what level of proficiency) in the course (Cilli-Turner et al. 2020). For example, an instructor may determine that 80% accuracy in total responses is the mastery level for a quiz that covers a single standard. Alternatively, a project may be graded on multiple standards with each standard requiring 70% mastery based on requirements in a rubric.

Decide how the Course Grade will be Calculated

Instructors also need to determine how the course grade will be calculated. For some alternative grading systems, this requires bundling of assignments while others use individual student contracts or other methods to determine course grades (Towsley and Schmid 2020). The links below provide more information on grading and other aspects of the different types of alternative grading systems:

  • Mastery Grading ,
  • Competency-based Grading ,
  • Contract Grading , and
  • Specifications Grading .

Incorporate Resubmission and Remediation

Finally, all standards-based grading systems incorporate methods for remediation and resubmission of assessments to allow students opportunities to meet mastery (Towsley and Schmid 2020). It is important to think through the logistics of resubmission of assignments and assessment. Can students resubmit any assignment or is a token system used (allowing each student a specific number of opportunities to resubmit assignments)? Are test retakes done during class or outside of class? Are resubmissions of the same type or are alternative formats given when students resubmit an assignment or assessment? Are assessment retakes and assignment resubmissions incorporated into the design of the course with multiple assessments on the same learning objective? For example, Zimmerman (2020) designed a high-enrollment introductory math course to include a quiz, unit exam, and final exam. Quizzes covered 1-3 learning objectives and unit exams usually covered 3-8 learning objectives while the final exam was cumulative and thus covered all learning objectives (Zimmerman 2020). If students scored higher on a subsequent assessment for a specific standard, that grade replaced the previous grade, but when the score was lower, then the grades were averaged (Zimmerman 2020). Thus, retakes for assessments were built into the course structure with three opportunities at different times in the semester for students to demonstrate their understanding for each learning objective (Zimmerman 2020).

For students to improve between retakes or resubmissions, remediation is necessary. The remediation can come in the form of instructor feedback, peer tutoring, additional assignments (that incorporate scaffolding of ideas to aid in student learning), and other supports specific to the discipline (writing centers, math centers, etc.). Instructor feedback is vital for student remediation but must be descriptive feedback (comments or tools to improve student work) instead of evaluative feedback (grades, judgments, etc.; Schinske and Tanner 2014). Descriptive feedback is most beneficial to student learning and success when students are required to read and use it in future assignments or for resubmissions (Schinske and Tanner 2014). Some important considerations for remediation are thinking about the time required to give feedback, how long students need to relearn and master a topic, and what will occur if a student continues to not meet mastery. Depending on the enrollment size, topics covered, and instructor preference, the types and methods used for remediation and resubmission can be diverse but must be included to qualify as standards-based grading.

When these common characteristics are well designed, standards-based grading frameworks increase fairness in grading by reducing biases and subjectivity (Burton 2006, Buckmiller et al. 2017), increase rigor by requiring students to learn the material in a deeper manner (Bonner 2016, Buckmiller et al. 2017, Scarlett 2018, Zimmerman 2020), and increase student ownership of their learning (Buckmiller et al. 2017, Scarlett 2018). Additionally, students at institutions that used standards-based grading had higher motivation to learn compared to students at institutions that only use traditional grading systems (Chamberlin et al. 2018) and were more likely to complete optional activities, practice self-assessment techniques, and seek help from the instructor (Zimmerman 2020). Overall, standards-based grading refocuses courses and instruction on learning over getting a grade.

Bonner, M. W. (2016). Grading rigor in counselor education: A specifications grading framework. Educational Research Quarterly 39.4: 21-42.

Bowen, R. S. and M. M. Cooper (2022). Grading on a curve as a systemic issue in equity in chemistry education. Journal of Chemical Education 99: 185-194.

Buckmiller, T., R. Peters, and J. Kruse (2017). Questioning points and percentages: standards-based grading (SBG) in higher education. College Teaching 65: 151-157.

Burton, K. (2006). Designing criterion-referenced assessments. Journal of Learning Design 1:73-82.

Chamberlin, K., M. Yasue, and I. A. Chiang (2018). The impact of grades on student motivation. Active Learning in Higher Education DOI: 10.1177/1469787418819728

Cilli-Turner, E., J. Dunmyre, T. Mahoney, and C. Wiley (2020). Mastery grading: Build-a-syllabus workshop. PRIMUS 30: 952-978.

Feldman, J. (2019). Grading for Equity: What it is, why it matters, and how it can transform schools and classrooms. Corwin: A SAGE Company, Thousand Oaks, CA.

Lalley, J. P. and J. R. Gentile (2009). Classroom assessment and grading to assure mastery. Theory Into Practice 48: 28-35.

Link, L. J. and T. R. Guskey (2019). How traditional grading contribute to student inequities and how to fix it. Educational, School, and Counseling Psychology Faculty Publications https://uknowledge.uky.edu/edp_facpub/53

Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher education. Assessment & Evaluation in Higher Education 30: 175-194.

Scarlett, M. H. (2018). “Why did I get a C?”: Communicating student performance using standards-based grading. InSight: A Journal of Scholarly Teaching 13:59- 75.

Strong, B., M. Davis, and V. Hawks (2004). Self-grading in large general education classes: a case study. College Teaching 52:52-57.

Schinske, J. and K. Tanner (2014). Teaching more by grading less (or differently). CBE – Life Sciences Education 13: 159-166.

Schwab, K. B. Moseley, and D. Dustin (2018). Grading grades as a measure of student learning. SCHOLE: A Journal of Leisure Studies and Recreation Education 33: 87-95.

Townsley, M. and D. Schmid (2020). Alternative grading practices: An entry point for faculty in competency-based education. Competency-based Education DOI: https://doi.org/10.1002/cbe2.1219.

Zimmerman, J. K. (2020). Implementing standards-based grading in large courses across multiple sections. PRIMUS 30: 1040-1053.

Alternative Grading for College Courses was written by Michelle Larson. Published to the website January 20, 2023.

  • Grading and Feedback
  • What is Mastery Grading?
  • What is Competency-based Grading?
  • What is Contract Grading?
  • What is Specification Grading?

Grading Principles and Guidelines

One of the primary goals of a proficiency-based grading system is to produce grades that more accurately reflect a student’s learning progress and achievement, including situations in which students struggled early on in a semester or school year, but then put in the effort and hard work needed to meet expected standards. If you ask nearly any adult, they will tell you that failures—and learning to overcome them—are often among the most important lessons in life.

When building a proficiency-based grading and reporting system, schools should begin by developing—ideally, in collaboration with faculty, staff, students, and families—a set of common principles and guidelines that apply to all courses and learning experiences. The guidelines should represent the school’s grading philosophy, including how grading will be used to support the educational process. In “Starting the Conversation about Grading” (Educational Leadership, November 2011), Susan M. Brookhart makes the following recommendation:

I cannot emphasize strongly enough that getting sidetracked with details of scaling (letters, percentages, or rubrics? Zeros or not? No Ds or Fs?) or policies (What should we do with late or missing work? How can we report behavior? What will we do about academic honors and awards?) before you tackle the question of what a grade means in the first place will lead to trouble. Logic, my own experience, and the research and practice of others (Cox & Olsen, 2009; Guskey & Bailey, 2010; McMunn, Schenck, & McColskey, 2003) all scream that this is the case. Grading scales and reporting policies can be discussed productively once you agree on the main purpose of grades. For example, if a school decides that academic grades should reflect achievement only, then teachers need to handle missed work in some other way than assigning an F or a zero. Once a school staff gets to this point, there are plenty of resources they can use to work out the details (see Brookhart, 2011; O’Connor, 2009). The important thing is to examine beliefs and assumptions about the meaning and purpose of grades first. Without a clear sense of what grading reform is trying to accomplish, not much will happen.

The following exemplar guidelines are offered as suggestions to schools as they implement a proficiency-based leaning system: 1. The primary purpose of the grading system is to clearly, accurately, consistently, and fairly communicate learning progress and achievement to students, families, postsecondary institutions, and prospective employers. 2. The grading system ensures that students, families, teachers, counselors, advisors, and support specialists have the detailed information they need to make important decisions about a student’s education. 3. The grading system measures, reports, and documents student progress and proficiency against a set of clearly defined cross-curricular and content-area standards and learning objectives collaboratively developed by the administration, faculty, and staff. 4. The grading system measures, reports, and documents academic progress and achievement separately from work habits, character traits, and behaviors, so that educators, counselors, advisors, and support specialists can accurately determine the difference between learning needs and behavioral or work-habit needs. 5. The grading system ensures consistency and fairness in the assessment of learning, and in the assignment of scores and proficiency levels against the same leaning standards, across students, teachers, assessments, learning experiences, content areas, and time. 6. The grading system is not used as a form of punishment, control, or compliance.In proficiency-based leaning systems, what matters most is where students end up—not where they started out or how they behaved along the way. Meeting and exceeding challenging standards defines success, and the best grading systems motivate students to work harder, overcome failures, and excel academically.

Additional Reading on Effective Grading Practices Many educators, academics and grading experts have dedicated their career to untangling some of the thornier issues related to grading and determining how grades can facilitate, rather than impede, the learning process for students. We have included a selected list of books below for those who want to learn more about the grading practices that support student learning. Each work outlines practical strategies that educators can use to build an effective proficiency-based grading and reporting system that values and supports the learning process.

Susan M. Brookhart Grading and Reporting: Practices that Support Student Achievement (2011) Thomas Guskey Answers to Essential Questions About Standards, Assessments, Grading, and Reporting (with Lee Ann Jung, 2012) Developing Standards-Based Report Cards (with Jane M. Bailey, 2009) Practical Solutions for Serious Problems in Standards-Based Report Cards (2008) Developing Grading and Reporting Systems for Student Learning (with Jane M. Bailey, 2000) Tammy Heflebower, Jan K. Hoegh, and Phil Warrick A School Leader’s Guide to Standards-Based Grading (2014) Robert Marzano Formative Assessment and Standards-Based Grading: Classroom Strategies that Work (2009) Classroom Assessment and Grading that Work (2006) Transforming Classroom Grading (2000) Ken O’Connor The School Leader’s Guide to Grading: Essentials for Principals Series (2012) A Repair Kit for Grading: Fifteen Fixes for Broken Grades (2010) How to Grade for Learning (2009) Douglas Reeves Elements of Grading: A Guide to Effective Practices (2010) Making Standards Work: How to Implement Standards-Based Assessments in the Classroom, School, and District (2004) Rick Stiggins Classroom Assessment for Student Learning: Doing It Right—Using It Well (with Jan Chappuis, Steve Chappuis, and Judith A. Arter, 2009) Rick Wormeli Fair Isn’t Always Equal: Assessing and Grading in the Differentiated Classroom (2006)

Download Grading Principles and Guidelines  (.pdf)

← Return to PBL Tool Menu

482 Congress Street, Suite 500 Portland, ME 04101 Phone: (207) 773-0505 Fax: (877) 849-7052

Advertisement

  • Publications

This site uses cookies to enhance your user experience. By continuing to use this site you are agreeing to our COOKIE POLICY .

Grab your lab coat. Let's get started

Create an account below to get 6 c&en articles per month, receive newsletters and more - all free., it seems this is your first time logging in online. please enter the following information to continue., as an acs member you automatically get access to this site. all we need is few more details to create your reading experience., not you sign in with a different account..

Password and Confirm password must match.

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

Already have an ACS ID? Log in here

The key to knowledge is in your (nitrile-gloved) hands

Access more articles now. choose the acs option that’s right for you..

Already an ACS Member? Log in here  

$0 Community Associate

ACS’s Basic Package keeps you connected with C&EN and ACS.

  • Access to 6 digital C&EN articles per month on cen.acs.org
  • Weekly delivery of the C&EN Essential newsletter

$80 Regular Members & Society Affiliates

ACS’s Standard Package lets you stay up to date with C&EN, stay active in ACS, and save.

  • Access to 10 digital C&EN articles per month on cen.acs.org
  • Weekly delivery of the digital C&EN Magazine
  • Access to our Chemistry News by C&EN mobile app

$160 Regular Members & Society Affiliates $55 Graduate Students $25 Undergraduate Students

ACS’s Premium Package gives you full access to C&EN and everything the ACS Community has to offer.

  • Unlimited access to C&EN’s daily news coverage on cen.acs.org
  • Weekly delivery of the C&EN Magazine in print or digital format
  • Significant discounts on registration for most ACS-sponsored meetings

history of grading system in education

Your account has been created successfully, and a confirmation email is on the way.

Your username is now your ACS ID.

Undergraduate Education

  • How an alternative grading system is improving student learning

Interest grows in specifications grading, an approach to assessing students that could offer more transparency and consistency

By celia henry arnaud, april 25, 2021 | a version of this story appeared in volume 99, issue 15.

  • U.S. team makes history at International Chemistry Olympiad
  • Chemistry in Pictures: Blinded by the magnesium
  • Chemistry in Pictures: Nitric acid acts upon copper
  • Chemistry lessons from kids’ books and sneakers

A conceptual illustration of a student with a yellow backpack climbing boxes of check marks and x's.

What’s in a grade? Extensive use of partial credit in points-based grades can make it difficult to tell what a student has actually learned. A small but growing number of chemistry professors are adopting an alternative grading system called specifications—or specs—grading. This approach is based on clearly defined learning outcomes, pass-fail grading with no partial credit, and multiple opportunities for students to demonstrate mastery. Professors who have adopted the approach have seen improvements in student learning. And early indications suggest that specs grading could be more equitable than conventional grading approaches.

For many professors, grading student work is the least enjoyable part of their jobs. “None of us get into teaching to grade,” says Renée Link, a professor of teaching in the Chemistry Department at the University of California, Irvine. She’s uncomfortable with how grades become gatekeeping devices, causing graduate schools and potential employers to discount students’ scientific abilities. Plus, grades aren’t as accurate and precise a measure of student learning as many people assume. On top of that, in large courses like Link’s, grades are routinely normalized to adjust for differences among graders. “It’s just a terrible system,” she says.

To make the experience of grading less onerous for her and her teaching assistants, Link has joined the small but growing number of chemistry professors who have adopted an alternative grading system known as specifications—or specs—grading.

This system eschews the conventional points-based approach to grading. The key components of specs grading are clearly defined learning outcomes (measurable statements of what a student will know or be able to do after taking a course), pass-fail grading of assignments with no partial credit, and multiple opportunities for students to demonstrate mastery of learning outcomes.

Related: Chemistry textbooks still lack gender and racial representation

But specs grading does more than make grading easier on professors. It makes grading more transparent for students. And it gives professors and students a clear picture of what students have learned and what they still need to work on. Because success is defined by detailed rubrics—the eponymous specifications—grading is more consistent and has the potential to be more equitable.

Nuts and bolts

Joshua Ring of Lenoir-Rhyne University and Santiago Toledo of St. Edward’s University were both early adopters of specs grading. Like many, they were inspired by a 2014 book by Linda B. Nilson, who is now director emerita of the Office of Teaching Effectiveness and Innovation at Clemson University. In her book, Specifications Grading: Restoring Rigor, Motivating Students, and Saving Faculty Time , she outlines the rationale for specs grading—making grading easier for professors, improving rigor in courses, and encouraging students to take responsibility for their learning—and the mechanics of how it works. Her approach is flexible enough that it can be adapted for any course that can be constructed around defined learning outcomes.

Before adopting specs grading in 2016, Toledo had put substantial effort into writing clearly delineated learning objectives for his general chemistry courses ( J. Chem. Educ. 2015, DOI: 10.1021/acs.jchemed.5b00184 ). Once he defined the learning outcomes, redesigning his assessments was a natural next step.

Bar chart showing the grade distribution in an organic chemistry class before and after implementing specifications grading.

Within each outcome, students have to complete multiple tasks, including homework assignments and “checkpoints,” Toledo’s version of quizzes. “We set specifications to what it means to get proficiency on each of those tasks,” he says. “The big picture is that we’re trying to give students feedback according to learning outcomes.”

The number of outcomes varies among professors who use specs grading. Some have as few as a dozen. Others have 30 or more. In organic chemistry, for instance, a student might be expected to master skills like drawing Lewis structures, predicting reactivity, and drawing reaction mechanisms.

Rather than rely on only a few high-stakes exams to gauge student progress, specs grading as used in chemistry involves frequent quizzes, each addressing one of the established outcomes. Quizzes or other assessments are typically graded on a pass-fail basis. Students either get a question right or they don’t—there’s no partial credit. The most widely used threshold for fulfilling an outcome is answering 80% of the questions on a quiz correctly.

The number of learning outcomes mastered determines students’ grades. Some professors divide their outcomes into two buckets: “essential” and “general.” Students must master all the ones designated essential in order to pass the class. Each additional general skill in which they demonstrate proficiency raises their grades.

This system means students know exactly how many outcomes they have to achieve to get the grades they want. “If your goal is a C, here’s what you have to do,” says Mary Beth Anzovino, who uses specs grading in her organic chemistry classes at Georgia Gwinnett College. “You don’t have to pass all the generals, but you do still have to pass all the essentials, because by earning a passing grade in this course, I’m basically signing off on saying this student can do X, Y, Z, A, B, C. Some of the other skills are perhaps less crucial.”

Justin Houseknecht, who uses specs grading in his organic chemistry classes at Wittenberg University, says the distinction between essential and general learning objectives confused some of his colleagues at first. He has 12 essential outcomes and roughly 18 general ones. His colleagues asked, “If something’s not essential, why are you teaching it?” Houseknecht says. He explained to them that the essential concepts underpin the general ones.

Students get multiple chances to pass each outcome. In that sense, specs grading more closely resembles how learning and assessment happen in real life. Mai Yin Tsoi, an organic chemistry professor at Georgia Gwinnett College, says the multiple opportunities are like the tests required to get a driver’s license. Failing the first driving test doesn’t put a license out of reach. Instead, people can study, practice, and retake the test.

Most chemistry professors using specs grading allow students a free retake for each outcome. Rather than give students unlimited attempts, many users of specs grading use a token system for earning additional opportunities. For example, in Ring’s organic class at Lenoir-Rhyne, students earn tokens by watching the videos he uses instead of in-class lectures or by completing “bridge” assignments that lead from the videos to material and problems he plans to present in class.

The ability to earn extra retakes provides the students a safety net. But in the process of earning the retakes, the students are engaging with the material and doing the work required to learn it. “Earning the retakes means that you’re probably not going to need them in the first place,” Ring says.

While all those retakes might sound like a scheduling headache to busy professors, schools have gotten creative with sharing the workload. At Georgia Gwinnett, several professors in different departments have adopted specs grading. The faculty members have formed a cooperative through which they each sign up for a time for proctoring retakes. The retakes all happen in a dedicated room in the college’s learning center. Students can sign up for any available time, no matter which professor is proctoring.

“It took a lot of the time burden of actually administering the retakes off of the individual faculty,” Anzovino says. “I could have up to 24 students in that room doing retakes from 10 different classes during my proctoring times.”

Another challenge created by the opportunity for multiple retakes is the need to write multiple versions of quizzes for each learning outcome. To reduce the number of versions they have to write, they don’t return some of the graded quizzes, so they can reuse them in subsequent semesters, Anzovino says. “I never change the quizzes too much because the learning outcomes are so fine grained. It’s a pretty specific set of skills that we want to be able to demonstrate on each of those quizzes.” But creating those quizzes still takes time.

Chemists at liberal arts colleges and universities with smaller class sizes were early adopters of specs grading, but some professors at universities with large classes are experimenting with the approach.

Link coordinates UCI’s organic chemistry lab sequence. Each quarter, many teaching assistants (TAs) teach more than 1,000 students in the course. In her conventional points-based grading system, she would find that some TAs were lenient and others were harsh. As a result, she would need to adjust students’ grades at the end of each term.

“One of the challenges with grad student TAs, especially early on, is that they still identify more as a student than as a teacher. They’re just like, ‘Well, you tried hard; let me give you credit,’ ” Link says. “I understand wanting to give them credit for their effort, but you’re actually hurting them because you’re making them think they know something when they don’t.”

Examples of quizzes on naming alkanes.

After a conversation with a colleague who was using alternative grading strategies in upper-division courses, Link started thinking about how she could change the grading in her lab courses. One sleepless night, she scoured the web for alternative grading systems and finally found specifications grading. It sounded familiar, and she remembered having caught the tail end of a talk Ring gave at the Biennial Conference on Chemical Education in 2016. At the time, she had dismissed the approach as infeasible for her large classes. But this time, she wondered if she could make it work.

“I ended up sketching out a notebook over one night,” she says. She determined how to break down the assignments into specifications. Some of her initial ideas of ways for students to demonstrate mastery turned out to be impractical because there wasn’t a good way to track them for such a large class.

She decided to try specs grading during the 2019 summer session, when the class was much smaller—only 37 students ( J. Chem. Educ. 2020, DOI: 10.1021/acs.jchemed.0c00450 ). She worked with two experienced TAs, Kate McKnelly and William Howitz, to define the criteria that students needed to meet to achieve specific grades. The previous points-based rubrics had combined multiple items in a single rubric. They separated these into multiple rubrics that could be quickly graded on a yes-no basis.

The TAs found that the assignments were easier to grade. They also found that their discussions with students tended to focus more on student understanding than on complaints about grading.

The students had mixed reactions, Link says. They worried that some minor omission in their postlab assignments would lower their grades, but the average grades for the class were higher than in Link’s previous classes. “They are freaked out about this hypothetical situation that pretty much never happens,” Link says.

In fall 2019, McKnelly also worked with Stephen Mang, an assistant professor of teaching at UCI, to convert two of his classes to specs grading. Those classes—a writing class and an upper-level instrumental analysis lab—are significantly smaller than Link’s. “I’ve never been afraid to totally reorganize my class with 3 weeks to go before the quarter starts. I jumped in with both feet to specifications grading,” Mang says.

Mang uses a system of high pass, low pass, and unsatisfactory for individual assignments. In the writing class, he establishes criteria at the sentence, paragraph, and assignment levels, with 7 criteria for small assignments and 10 criteria for large assignments ( J. Chem. Educ. 2021, DOI: 10.1021/acs.jchemed.0c00859 ). He grades individual criteria as met or not met, and the number of criteria that students meet determines their overall performance on an assignment. High pass is at least 6/7 or 8/10, low pass is 5/7 or 6/10, and unsatisfactory is anything lower.

“I think partial credit makes a lot of sense at the assignment level, but I don’t think it makes sense at the criteria level,” Mang says. “If we’re asking students to demonstrate mastery of a skill, that’s very much a yes or no question. If we’re saying, ‘Did you demonstrate mastery of enough skills to pass this assignment or pass this class?,’ that’s more of a fuzzy question.”

When McKnelly moved to Emory University as a lecturer, she brought specs grading with her. She’s using it in a macromolecules lab, which is the fourth course of the introductory sequence of Emory’s new Chemistry Unbound curriculum.

Even at institutions not yet using specs grading, there are signs of interest. Colleen Craig, an associate teaching professor at the University of Washington, is part of the team that teaches general chemistry. “The time is right for us to think about some of these things around grading and how we’re teaching our courses,” she says. Craig is gathering information about the approach to bring to the Chemistry Department’s undergraduate curriculum committee. She has invited Toledo to give a seminar or workshop on specs grading.

Selling points

Students have a variety of reactions to the approach, professors say.

Some students find specs grading confusing at first. They are used to taking a few high-stakes exams throughout a course in addition to more frequent low-stakes quizzes. “They don’t understand why there aren’t any exams. ‘When are we going to take an exam over this?’ is a very common question,” Wittenberg University’s Houseknecht says. The different effects of essential and general learning objectives on grades are particularly hard for students to grasp. “But once they figure it out, most of them seem to appreciate it.” Houseknecht has seen the percentage of As and Bs in his courses increase significantly since he switched from traditional to specs grading, but he’s also seen an increase in the percentage of Ds, Fs, and withdrawals. “There are students who are failing who used to be able to kind of squeeze by and pass without learning much.”

“Making sure you establish student buy-in is crucial,” McKnelly says. Midway through the semester, she did an in-class activity in which the students looked at their grade trackers to see what they still needed to do to reach their desired final grades. “I’ve noticed that every week they’ve gotten more and more comfortable” with specs grading, she says.

“I get really positive feedback from students, particularly about how they feel more in control of their learning and their grade than they did before,” Ring says.

Students also don’t need to cram before exams as often. Instead, they have to maintain a consistent level of effort to keep on track.

“Those two nights of cramming before exam number 2 were gone, and I think the students felt relief with that,” Georgia Gwinnett’s Tsoi says. Still, the system isn’t for everyone. Tsoi remembers a student who withdrew from her class—he preferred cramming for exams because between exams, he had a couple of weeks to focus on other classes before turning back to chemistry.

And the frequent quizzes can make students think that the approach is “relentless,” especially this year, Houseknecht says. Every week they have one or two quizzes, and for the 2020–21 academic year, Wittenberg University eliminated its fall and spring breaks in response to the pandemic. “We’re all tired by the end of the semester, and I think specifications grading makes that a little harder,” Houseknecht says. “It’s not like the week after the exam you can just take a break.”

Mang’s students have fallen into distinct camps: those who like the approach and say that it’s clear what they have to do to achieve their desired grades, and those who say the system doesn’t adequately reward their effort.

Eliminating partial credit can improve interactions with students. Tsoi says it particularly changes end-of-semester conversations. “It’s never ‘Can you fix my grade’ or ‘Can you just bump me two points?’ ” she says. “It’s ‘Can I have another retake?,’ which I find fascinating. The conversation is completely different.”

Users of specs grading suspect that the approach is more equitable than conventional points-based systems. In the traditional system, some students may have difficulty interpreting what professors are looking for and may hesitate to approach faculty. Other students who are comfortable with such conversations may end up persuading professors to give them extra credit or change their grades. The transparency of specs grading “makes all that information available to all students, not just a few,” Toledo says. “And I think that’s really powerful.”

But education researchers would like to see data. Paulette Vincent-Ruz, a chemistry education researcher at the University of Michigan who focuses on issues of equity and justice, thinks specifications grading has the potential to be more equitable but stresses that equity doesn’t happen spontaneously. “You need to be really intentional about certain aspects of the design to make sure things are equitable,” she says.

One of the issues is that conventional points-based grading systems, often curved, are so prone to bias that specs grading looks good in comparison, Vincent-Ruz says. Now that more chemists are using specs grading, she says, it’s time to see whether professors can pick and choose parts of the approach they want to use and still get the benefits. She also notes that what works best may depend on the institution and its student population.

Link is collaborating with Vincent-Ruz to see if the equity claims are valid. Grades have gone up in Link’s classes since she switched to specs grading—with the most improvement in the off-sequence classes, which typically have students who have struggled in chemistry. “They are now on par with the on-sequence grades,” Link says.

Link consistently sees improved grades with specs grading. Other people might worry about grade inflation, but that’s not what bothers her. “What keeps me up at night is, ‘Was I harming the previous students?’ ”

You might also like...

Serving the chemical, life science, and laboratory worlds

Sign up for C&EN's must-read weekly newsletter

Contact us to opt out anytime

  • Share on Facebook
  • Share on Twitter
  • Share on Linkedin
  • Share on Reddit

This article has been sent to the following recipient:

Join the conversation

Contact the reporter

Submit a Letter to the Editor for publication

Engage with us on Twitter

The power is now in your (nitrile gloved) hands

Sign up for a free account to get more articles. or choose the acs option that’s right for you..

Already have an ACS ID? Log in

Create a free account To read 6 articles each month from

Join acs to get even more access to.

The Problem with Grading

  • Posted May 19, 2023
  • By Lory Hough
  • Education Reform
  • Student Achievement and Outcomes

Illustration of a magician's hat by Nate Williams

My son’s binder was a mess. Loose papers were falling out, others looked like they had been balled up or stepped on, some more than once. The binder itself was bent in one corner. But he was a seventh-grader and to him, it looked just fine.

Unfortunately, his seventh-grade math teacher didn’t agree and deducted points from his grade for being messy. This same teacher also took off points when homework was completed with something other than a pencil or if a student needed a second copy of an assignment. If a student was asked to move their seat during class, she slashed five points. Points were earned back if a parent signed the list of rules, and it was returned in a timely manner.

Being organized and not misbehaving in class are skills students need to figure out, for sure, and I certainly wanted my son to be neater, but factoring these behaviors into grades — especially for middle-schoolers just learning to come into their own — didn’t make sense to me.

And so, when I learned, a few years later, that my son’s high school was rethinking their grading practice, I decided it was time to dig deeper into what Grading for Equity author Joe Feldman, Ed.M.’93, calls “one of the most challenging and emotionally charged conversations in today’s schools.”

Grades Are What?

I started by asking a question that seems simple on the surface: What is a grade?

Feldman, a former teacher and principal, says that on a really basic level, grades are the way teachers calculate and report student performances. Typically, it’s an accumulation of points (0 to 100) with corresponding letters (A through F, minus E). Earn an 89 on a test and your grade is a B+, for example. Believed to date back to 1785, when Yale President Ezra Stiles gave four grades to his seniors ( optimi , second optimi , inferiors , and pejores ), grades have long been a part of our education system in the United States. In fact, Feldman says, grades have become “the main criteria in nearly every decision that schools make about students,” from whether they get promoted to the next class or held back, to which course level a student should be taking, such as college prep, honors, or AP. It’s how many high schools tally GPA and student rank, and one of the main ways that colleges decide who they’ll even consider for admissions.

“Grading is evaluation, putting a value on something,” says Denise Pope, Ed.M.’89, a senior lecturer at Stanford who runs a project called Challenge Success. Pope stresses, however, that grades are not the same as assessment, and to really talk about grading, we have to make the distinction between the two terms.

“Assessment is feedback so that students can learn,” Pope says. “It’s helping them see where they are and helping them move toward a point of greater understanding or mastery. Grading doesn’t always do that, but assessment should.”

When she hosts professional development workshops to help schools rethink their assessment practices, she likes to point out that the Latin root of assessment is assidere , which means to sit beside. Assessment is seeing where a student is with their understanding — what they don’t know, what they do know — and then using that to determine what they need. “Sometimes a grade does that,” Pope says, “but a lot of times students have no idea what that grade means.”

And that’s what seems to be at the heart of the debate about grading, and what rubbed me the wrong way when my son was in that math class: Students, teachers, parents, and college admissions officers have no idea what a letter grade — this thing we are saying is really important in a student’s school life — is really saying. Does an A mean a student has truly mastered that history lesson? Does the C+ mean the student was “sort of” getting the math they were learning, or did it mean they were an ace at math, but just couldn’t keep a neat binder?

What’s the Problem?

The confusion starts with consistency, as in, there is none. At most schools, there’s no consistency about what’s included in a grade or what’s left out, even among teachers teaching the same subject in the same school to students in the same grade at the same level. This creates what is often called “grade fog” — we’re not sure what the grade means because we’re asking that A or that C+ to communicate too much disparate information.

“It’s radically inconsistent from teacher to teacher,” says A.J. Stitch, Ed.M.’12, the founding principal of the Greater Dayton School, a private school in Ohio for kids from low-income backgrounds that doesn’t use traditional grades. At public schools where he has worked in the past, he says “most teachers had different approaches to weighting homework, classwork, quizzes, and tests.”

For example, he says, “a student may demonstrate mastery of content on a test, quiz, and classwork, yet still fails a course because the teacher decides to weigh homework 40%, and the student, for one reason or another, struggles in that regard. Obviously, that’s inequitable, and it illustrates the variation of weighted grade scales and how it impacts a student’s success or failure, regardless of whether they mastered the standards taught in the course. Sadly, I made this mistake myself as a young teacher, and as a principal I’ve seen too many teachers make this mistake, too.”

Jason Merrill, the principal of Melrose High School, where my son currently goes to school, says this is one of the biggest reasons they started looking at their teaching and learning practices, and why they applied to become one of five schools in the multi-year Rethinking Grading Pilot program sponsored by the Massachusetts Department of Elementary and Secondary Education.

Man with watering cans illustration by Nate Williams

“Your son has eight teachers right now that all have their own way to grade. Completely their own,” he says. “The average kid often gives up trying to figure it out. Some teachers count homework, some teachers don’t. Some teachers grade homework, some teachers grade it as completion. Some teachers count large tests for a lot more than others. What we want to do is not have 85 different ways to respond to a fire alarm.”

Feldman says we also don’t want to include non-academics in grades — things like messy binders and not coming to class with a pencil, or the one that is commonly factored in: late work.

“A student who writes an A-quality essay but hands it in late gets her writing downgraded to a B, and the student who writes a B-quality essay turned in by the deadline receives a B. There’s nothing to distinguish those two B grades, although those students have very different levels of content mastery,” he says.

Traditional grading also invites biases, he says, especially around behavior. “When we include a student’s behavior in a grade, we’re imposing on all of our students a narrow idea of what a ‘successful’ student is,” Feldman says, and “you start to misrepresent and warp the accuracy.” For example, a student who participates in discussions and always brings their pencil to class earns five points, but they get a C on the test. Adding the five behavior points lifts that C test grade to something in the low B range. Although students and parents are happy the grade is a B and that the student’s all-important GPA remains intact, this warping can create longer term problems.

“You’re telling the student that they’re at a B level in content, and they’re actually at a C,” Feldman says. “They don’t think there’s a problem, the counselors don’t think there’s a problem, and the student goes to the next grade level and gets crushed by the content. They had no idea that they weren’t prepared for the rigor of that class because they kept getting the message that they were getting B’s.”

It can be especially confusing for parents, says Christopher Beaver, one of the assistant principals at Melrose High. “I knew what my own kids could do skill-based wise, but if I’m a parent and I don’t know what my kids can do because the teachers haven’t laid that out for me on a report card, then I can’t look at a report card and say, ‘See that. My kid is proficient at this skill or my kid is proficient at that skill,’” he says. “I’m going to focus on something like the GPA because that’s all I have. And I’m going to assume, if my kid has a high GPA, that my kid’s skillset is at a proficient level. But that is not always the case.”

As a parent, I was confused earlier this year when my son’s overall grade in a class was low, even though he seemed to get the content. We looked online at the grading portal the district uses and sure enough, he had Bs and As. But then there was that one grade: a 44 on a test he didn’t have enough time to finish. That one low test score brought the whole grade down because of another impossible part of how we grade: averaging.

“We have this ridiculous system of averaging things out,” Pope says, “which doesn’t make any sense because the goal is to get students to learn material. Same with the case against zero, right? Why would you give a kid a zero? A zero is worse than an F.”

The “case against zero” idea is that when using a 0-to-100-point scale in grading, a student should never receive a zero, even if they didn’t turn in an assignment. Sounds odd, given that a zero for not turning in work is how we’ve long operated, but as author Doug Reeves wrote in 2004 in “The Case Against the Zero” in Phi Delta Kappan , “assigning a zero is disproportionate punishment.”

Why? Because mathematically, with a 0-to-100 scale, failing a class is more likely than passing a class. Think about it. Each letter grade is 10 points — an A is 90-100, a B is 80- 89, a C is 70-79, and a D is 60-69 — but the scale’s one failing grade, an F, spans not 10 points, but 60 (0 to 59). The result is that a zero disproportionally pulls down an average and makes it that much harder to pull a grade up significantly. A student with two 85s, for example, is averaging a B. If that student gets a 0 on one assignment, their average drops to 56, an F. Even if the student gets 85s on the next two assignments, their average still only jumps to a 68. So, four Bs and one zero means the student’s averaged overall grade is a D+.

This averaging especially penalizes students who start out a semester slower with lower grades. Even if they figure out the material and fully master content later, averaging won’t necessarily reflect what they truly know. In his book, Feldman gives an example of a student who, coming into ninth grade, had never learned to write a persuasive essay. The ninth-grade teacher gives an assignment early in September, revealing this student’s writing inexperience.

“The essay gets a D-. But it’s early in September, and you, as the teacher, provide instruction and guided practice with feedback,” Feldman writes. The student’s writing improves, and their grade goes up with each new assignment. The student eventually learns how to write an amazing persuasive essay. They are doing A work. However, when the grades are averaged, that early D- drags down the overall grade and though the student mastered persuasive writing, their A drops to a B-.

Add Stress to the Mix

Beyond the problems with how we grade or what a grade means, Robin Loewald, Ed.M.’19, an English teacher at Melrose High, also worries about the effect grades have on student mindset, especially for middle- and high-schoolers.

“Grading in general is tough because of the expectations for students with college applications,” she says. “There tends to be a lot of stress around grades and the minute difference between a 93 and 94. In truth, it’s hard to really delineate the difference between those two numbers in terms of student understanding and mastery of the subject.”

Pope focuses her work extensively on the stress students take on trying to chase “good” grades and the extrinsic motivation — driven by external rewards — that takes over. In an op-ed she co-authored in February for The Hechinger Report about the furor over ChatGPT, she wrote that instead of asking how to stop students from cheating using bot programs, we should instead be asking “why” students are cheating in the first place. Chasing those good grades is part of that “why.”

“We have this real system of you need to get the grades and the test scores in order to please your parents, go to college, get the merit scholarship, get a good job — whatever it is,” she says. “There’s this extrinsic motivation that’s tied to grades, which adds to student stress, and in some cases can lead to really unhealthy practices like perfectionism or great anxiety, paralysis. And it could also really turn kids off. ‘Well, I got a C so I’m bad at math. I’m not a math person so clearly, I shouldn’t try anymore.’”

As Feldman said during an interview in 2019 with the Harvard EdCast , for students, even attempting to follow the range of grading practices each of their six or seven teachers follows can be stressful.

“For the student, it adds to my cognitive load,” he says. “I not only have to understand the content and try and perform at high levels of the content, but now I also have to navigate a grading structure that may not be totally transparent, and may be different for every teacher, and particularly for students who are historically underserved and have less education background and fewer resources and understanding of how to navigate those really foreign systems. It places those additional burdens on them, which we shouldn’t do.”

Are There Alternatives?

If traditional grades say little about a student’s mastery of the material, are often inequitable, and can add more stress, what are better ways for teachers and schools to capture a student’s skills and understanding of the material? And given the long history of using numbers and letter grades, are schools even ready to change?

Back in 2005, Chester Finn Jr., M.A.T’67, Ed.D.’70, then president of the Washington-based Thomas B. Fordham Foundation, told The Washington Post that “high schools will keep using them if college admissions offices keep requiring them, which they likely will.”

But nearly two decades since Finn made that observation, it’s clear that some schools, like my son’s, are ready for change and have ideas on how to do that.

At the Greater Dayton School, Stitch says their ability to work outside the structure and limitations of a public school gave them the liberty to design whatever grading scale they thought was best for kids. They chose not to use the A to F scale.

“The traditional grading system is not aligned to learning outcomes,” he says. “Traditional grading is one-and-done in terms of you’ve learned the content, or you haven’t, and the grade you get is the grade you get. A better grading system allows for multiple attempts of content mastery.”

Which is why his school uses only two grades — “mastered” and “in progress,” and students have unlimited chances to learn the material and become proficient, he says. Students also learn at their own pace and the school’s standards are broken into kid friendly “I can” statements so parents and students know exactly what skills a student “can” do and which skills they are working on.

A few years ago, Melrose High started allowing students to redo their work if the grade was below a certain number. The idea was that learning shouldn’t be punitive — it was about mastering content, even if that took more than one try.

As Merrill says, “At the end of the day, we want all kids to learn. We don’t want to prove that they don’t know something. We want to be like, you need to do some work to retake this again to show us that you do know it.”

Loewald says the school’s English department additionally has an extended revision policy around writing assignments, where students can meet with their teachers to edit, revise, and resubmit their writing work. She allows students to revise almost every assignment.

“I think that the process of learning through revision is really helpful and allows there to be less pressure on the initial submission of work,” she says. “Students are graded on rubrics and can use those rubrics to guide their revisions of assignments. The only assignments that I do not allow students to revise are their reading checks since those are things we talk about and reference in the class in which they’re due.”

Merrill says the school’s revision policy is a work in progress — it needs its own revision — because there is currently too much variation in what students can redo. “We are working to build a single, consistent retake policy. If we de-emphasize the weighting for formative assessment and practice materials, such as homework and classwork, then we can have a retake policy that addresses summative assessments only,” he says.

Caitlin Reilly, Ed.M.’14, recently started as a deputy principal at Revere High School, located just north of Boston and part of the state’s Rethinking Grading Pilot. She says the school is moving toward a full competency-based model. Although there’s variation on how competency-based is defined, it generally means that instead of evaluating students as proficient based on the amount of time they spend on a subject — 58 minutes for factoring polynomials or three years taking a foreign language — time allotment is shifted to how well students can define what they actually know about a subject. And those competencies aren’t vague — they’re clearly spelled out by a school.

“For us, competency learning is a matter of equity for students because it makes apparent to all students, what are you working toward?” says Reilly. “Where do you not yet have the skills? What support do you need? And students should be seeing their progress to the standards of the course. Knowing that is incredibly important for all students, versus the hidden game of school when you have this letter grade, and you don’t know where it’s generated from, or you have a test that you got 10 points just for writing your name.”

One of the areas Revere High is working on with the grant, she says, is rethinking report cards. Their current approach mimics, in some ways, what elementary schools typically do, which is to include comments about student strengths or areas that need improving in their habits-of-work, not just the letter grade. They are working on transitioning course grades from a single letter to a report of proficiency on course competencies.

“Our current report card is a one-pager that has letter grades … but for every class students have, there’s a habits-of-work box that includes the four habits-of-work that we assess: active learning, respect, collaboration, and ownership,” she says. For each habit, there’s a scale of proficient, some proficiency, or not yet proficient, with rubric-defined criteria that guides the understanding of what it is to be proficient in each category.” In that way, it’s not just a teacher’s general “sense” of which category to pick or a parent’s guess as to what each habit actually means.

As I talked to educators about other ways to rethink how we grade, some suggested dropping the lowest grade in a class or not grading assignments done early in a semester. Many mention not grading homework but instead allowing that work to be a place where students can figure things out and make mistakes, especially when new concepts are introduced. Others talk about doing away with the 0-to-100 scale. In Melrose, Loewald says the English Department has already shifted to a 1-to-4 scale.

“A four meaning the student is exceeding expectations, three is meeting, two is approaching, and one is developing,” she says. “It’s much more accurate in terms of assessing student learning to use a smaller scale.”

Feldman says that with any change around such an entrenched topic like grading, “We are learning that you actually have to invest in teacher understanding along with policy development in order to change practice around grading.”

It’s something my son’s school has already jumped on with a core group of administrators and teachers examining current practices and testing out some of the changes they want to make.

“They’ve all set goals for themselves and are participating in regular coaching,” says Melanie Acevedo, the district’s director of instructional technology and personalized learning. “They come to a meeting once a month and talk about what’s working, what’s not working. They are a group that’s trying things out. They’re being the people that are booted on the ground, really experimenting so that we can come back to the bigger faculty and say, here are some things that people have tried. Do you want to try that? We’re building this idea from the staff and from the teachers because they’re the ones that know best.”

One of the things Melrose High isn’t doing, at least not yet, is blowing up the entire grading system or even doing away with traditional A to F grades.

Instead, says Merrill, they’ve set a goal so that by next fall they have “a very clear, consistent, transparent grading practice and policy in place for all teachers,” he says, and can answer questions like: How do we assess kids? How do we communicate that? How do kids know where they stand? How do they reflect and retake or do revisions? How do we count homework? Is that grading equitable? “There are so many pieces that go into it,” he says, “but we’re not looking to make any of our kids a trial.”

Luckily, there’s broader interest in “rethinking grading,” as the Massachusetts pilot is called. Sales for Feldman’s Grading for Equity book are robust enough that he’s working on a second, updated edition, and, he says, “I am not any less confident that this is one of the most important levers that schools and districts can use to not only improve student achievement, but also reduce achievement and opportunity disparities.”

Rethinking grading may even keep some teachers in the profession longer.

“We’ve heard, and we have some data, that this work actually increases the likelihood that some teachers would stay in their district,” Feldman says. “We see a real crisis in the retention of the teaching force. Knowing that there’s a learning opportunity that can engage them more directly with why they went into teaching in the first place, and gets them more excited about teaching, I think is really important.” Teachers, he says, don’t want to be the bean counters or police officers they often become when it comes to grading.

“The  five participation points every day. The, you turned it in late one day, so you lose 10% or you turned it in two days late so 20%,” he says. “None of us went into teaching to do that.”

Extra Credit

Man with watering cans illustration by Nate Williams

Ed. Magazine

The magazine of the Harvard Graduate School of Education

Related Articles

Report Card

The Cultural Power of Report Cards

The evolution and significance of report cards in the American education system

Chalkboard drawing of people holding stars

Rethink Grading

27 minute read

Grading Systems

School, higher education.

SCHOOL Thomas R. Guskey

HIGHER EDUCATION Howard R. Pollio

Few issues have created more controversy among educators than those associated with grading and reporting student learning. Despite the many debates and multitudes of studies, however, prescriptions for best practice remain elusive. Although teachers generally try to develop grading policies that are honest and fair, strong evidence shows that their practices vary widely, even among those who teach at the same grade level within the same school.

In essence, grading is an exercise in professional judgment on the part of teachers. It involves the collection and evaluation of evidence on students' achievement or performance over a specified period of time, such as nine weeks, an academic semester, or entire school year. Through this process, various types of descriptive information and measures of students' performance are converted into grades or marks that summarize students' accomplishments. Although some educators distinguish between grades and marks, most consider these terms synonymous. Both imply a set of symbols, words, or numbers that are used to designate different levels of achievement or performance. They might be letter grades such as A, B, C, D, and F; symbols such as &NA;+, &NA;, and &NA;−; descriptive words such as Exemplary, Satisfactory, and Needs Improvement; or numerals such as 4, 3, 2, and 1. Reporting is the process by which these judgments are communicated to parents, students, or others.

A Brief History

Grading and reporting are relatively recent phenomena in education. In fact, prior to 1850, grading and reporting were virtually unknown in schools in the United States. Throughout much of the nineteenth century most schools grouped students of all ages and backgrounds together with one teacher in one-room schoolhouses, and few students went beyond elementary studies. The teacher reported students' learning progress orally to parents, usually during visits to students' homes.

As the number of students increased in the late 1800s, schools began to group students in grade levels according to their age, and new ideas about curriculum and teaching methods were tried. One of these new ideas was the use of formal progress evaluations of students' work, in which teachers wrote down the skills each student had mastered and those on which additional work was needed. This was done primarily for the students' benefit, since they were not permitted to move on to the next level until they demonstrated their mastery of the current one. It was also the earliest example of a narrative report card.

With the passage of compulsory attendance laws at the elementary level during the late nineteenth and early twentieth centuries, the number of students entering high schools increased rapidly. Between 1870 and 1910 the number of public high schools in the United States increased from 500 to 10,000. As a result, subject area instruction in high schools became increasingly specific and student populations became more diverse. While elementary teachers continued to use written descriptions and narrative reports to document student learning, high school teachers began using percentages and other similar markings to certify students' accomplishments in different subject areas. This was the beginning of the grading and reporting systems that exist today.

The shift to percentage grading was gradual, and few American educators questioned it. The practice seemed a natural by-product of the increased demands on high school teachers, who now faced classrooms with growing numbers of students. But in 1912 a study by two Wisconsin researchers seriously challenged the reliability of percentage grades as accurate indicators of students' achievement.

In their study, Daniel Starch and Edward Charles Elliott showed that high school English teachers in different schools assigned widely varied percentage grades to two identical papers from students. For the first paper the scores ranged from 64 to 98, and the second from 50 to 97. Some teachers focused on elements of grammar and style, neatness, spelling, and punctuation, while others considered only how well the message of the paper was communicated. The following year Starch and Elliot repeated their study using geometry papers submitted to math teachers and found even greater variation in math grades. Scores on one of the math papers ranged from 28 to 95–a 67-point difference. While some teachers deducted points only for a wrong answer, many others took neatness, form, and spelling into consideration.

These demonstrations of wide variation in grading practices led to a gradual move away from percentage scores to scales that had fewer and larger categories. One was a three-point scale that employed the categories of Excellent, Average, and Poor. Another was the familiar five-point scale of Excellent, Good, Average, Poor, and Failing, (or A, B, C, D, and F ). This reduction in the number of score categories served to reduce the variation in grades, but it did not solve the problem of teacher subjectivity.

To ensure a fairer distribution of grades among teachers and to bring into check the subjective nature of scoring, the idea of grading based on the normal probability, bell-shaped curve became increasingly popular. By this method, students were simply rank-ordered according to some measure of their performance or proficiency. A top percentage was then assigned a grade of A, the next percentage a grade of B, and so on. Some advocates of this method even specified the precise percentages of students that should be assigned each grade, such as the 6-22-44-22-6 system.

Grading on the curve was considered appropriate at that time because it was well known that the distribution of students' intelligence test scores approximated a normal probability curve. Since innate intelligence and school achievement were thought to be directly related, such a procedure seemed both fair and equitable. Grading on the curve also relieved teachers of the difficult task of having to identify specific learning criteria. Fortunately, most educators of the early twenty-first century have a better understanding of the flawed premises behind this practice and of its many negative consequences.

In the years that followed, the debate over grading and reporting intensified. A number of schools abolished formal grades altogether, believing they were a distraction in teaching and learning. Some schools returned to using only verbal descriptions and narrative reports of student achievement. Others advocated pass/fail systems that distinguished only between acceptable and failing work. Still others advocated a mastery approach, in which the only important factor was whether or not the student had mastered the content or skill being taught. Once mastered, that student would move on to other areas of study.

At the beginning of the twenty-first century, lack of consensus about what works best has led to wide variation in teachers' grading and reporting practices, especially among those at the elementary level. Many elementary teachers continue to use traditional letter grades and record a single grade on the reporting form for each subject area studied. Others use numbers or descriptive categories as proxies for letter grades. They might, for example, record a 1, 2, 3, or 4, or they might describe students' achievement as Beginning, Developing, Proficient, or Distinguished. Some elementary schools have developed standards-based reporting forms that record students' learning progress on specific skills or learning goals. Most of these forms also include sections for teachers to evaluate students' work habits or behaviors, and many provide space for narrative comments.

Grading practices are generally more consistent and much more traditional at the secondary level, where letter grades still dominate reporting systems. Some schools attempt to enhance the discriminatory function of letter grades by adding plusses or minuses, or by pairing letter grades with percentage indicators. Because most secondary reporting forms allow only a single grade to be assigned for each course or subject area, however, most teachers combine a variety of diverse factors into that single symbol. In some secondary schools, teachers have begun to assign multiple grades for each course in order to separate achievement grades from marks related to learning skills, work habits, or effort, but such practices are not widespread.

Research Findings

Over the years, grading and reporting have remained favorite topics for researchers. A review of the Educational Resources Information Center (ERIC) system, for example, yields a reference list of more than 4,000 citations. Most of these references are essays about problems in grading and what should be done about them. The research studies consist mainly of teacher surveys. Although this literature is inconsistent both in the quality of studies and in results, several points of agreement exist. These points include the following:

Grading and reporting are not essential to the instructional process. Teachers do not need grades or reporting forms to teach well, and students can and do learn many things well without them. It must be recognized, therefore, that the primary purpose of grading and reporting is other than facilitation of teaching or learning.

At the same time, significant evidence shows that regularly checking on students' learning progress is an essential aspect of successful teaching–but checking is different from grading. Checking implies finding out how students are doing, what they have learned well, what problems or difficulties they might be experiencing, and what corrective measures may be necessary. The process is primarily a diagnostic and prescriptive interaction between teachers and students. Grading and reporting, however, typically involve judgment of the adequacy of students' performance at a particular point in time. As such, it is primarily evaluative and descriptive.

When teachers do both checking and grading, they must serve dual roles as both advocate and judge for students–roles that are not necessarily compatible. Ironically, this incompatibility is usually recognized when administrators are called on to evaluate teachers, but it is generally ignored when teachers are required to evaluate students. Finding a meaningful compromise between these dual roles is discomforting to many teachers, especially those with a child-centered orientation.

Grading and reporting serve a variety of purposes, but no one method serves all purposes well. Various grading and reporting methods are used to: (1) communicate the achievement status of students to their parents and other interested parties; (2) provide information to students for self-evaluation; (3) select, identify, or group students for certain educational paths or programs; (4) provide incentives for students to learn; and (5) document students' performance to evaluate the effectiveness of instructional programs. Unfortunately, many schools try to use a single method of grading and reporting to achieve all of these purposes and end up achieving none of them very well.

Letter grades, for example, offer parents and others a brief description of students' achievement and the adequacy of their performance. But using letter grades requires the abstraction of a great deal of information into a single symbol. In addition, the cut-offs between grades are always arbitrary and difficult to justify. Letter grades also lack the richness of other, more detailed reporting methods such as narratives or standards-based reports.

These more detailed methods also have their drawbacks, however. Narratives and standardsbased reports offer specific information that is useful in documenting student achievement. But good narratives take time to prepare and as teachers complete more narratives, their comments become increasingly standardized. Standards-based reports are often too complicated for parents to understand and seldom communicate the appropriateness of student progress. Parents often are left wondering if their child's achievement is comparable with that of other children or in line with the teacher's expectations.

Because no single grading method adequately serves all purposes, schools must first identify their primary purpose for grading, and then select or develop the most appropriate approach. This process involves the difficult task of seeking consensus among diverse groups of stakeholders.

Grading and reporting require inherently subjective judgments. Grading is a process of professional judgment–and the more detailed and analytic the grading process, the more likely it is that subjectivity will influence results. This is why, for example, holistic scoring procedures tend to have greater reliability than analytic procedures. However, being subjective does not mean that grades lack credibility or are indefensible. Because teachers know their students, understand various dimensions of students' work, and have clear notions of the progress made, their subjective perceptions can yield very accurate descriptions of what students have learned.

Negative consequences result when subjectivity translates to bias. This occurs when factors apart from students' actual achievement or performance affect their grades. Studies have shown, for example, that cultural differences among students, as well as their appearance, family backgrounds, and lifestyles, can sometimes result in biased evaluations of their academic performance. Teachers' perceptions of students' behavior can also significantly influence their judgments of academic performance. Students with behavior problems often have no chance to receive a high grade because their infractions over-shadow their performance. These effects are especially pronounced in judgments of boys. Even the neatness of students' handwriting can significantly affect teachers' judgments. Training programs help teachers identify and reduce these negative effects and can lead to greater consistency in judgments.

Grades have some value as rewards, but no value as punishments. Although educators would undoubtedly prefer that motivation to learn be entirely intrinsic, the existence of grades and other reporting methods are important factors in determining how much effort students put forth. Most students view high grades as positive recognition of their success, and some work hard to avoid the consequences of low grades.

At the same time, no studies support the use of low grades or marks as punishments. Instead of prompting greater effort, low grades usually cause students to withdraw from learning. To protect their self-image, many regard the low grade as irrelevant and meaningless. Other students may blame themselves for the low mark, but feel helpless to improve. Grading and reporting should always be done in reference to learning criteria, never "on the curve." Although using the normal probability curve as a basis for assigning grades yields highly consistent grade distributions from one teacher to the next, there is strong evidence that it is detrimental to relationships among students and between teachers and students. Grading on the curve pits students against one another in a competition for the few rewards (high grades) distributed by the teacher. Under these conditions, students readily see that helping others threatens their own chances for success.

Modern research has also shown that the seemingly direct relationship between aptitude or intelligence and school achievement depends on instructional conditions. When the quality of instruction is high and well matched to students' learning needs, the magnitude of this relationship diminishes drastically and approaches zero. Moreover, the fairness and equity of grading on the curve is a myth.

Relating grading and reporting to learning criteria, however, provides a clearer picture of what students have learned. Students and teachers alike generally prefer this approach because they consider it fairer. The types of learning criteria teachers use for grading and reporting typically fall into three general categories:

  • Product criteria are favored by advocates of standards-based approaches to teaching and learning. These educators believe the primary purpose of grading and reporting is to communicate a summative evaluation of student achievement and performance. In other words, they focus on what students know and are able to do at a particular point in time. Teachers who use product criteria base grades exclusively on final examination scores, final products (reports or projects), overall assessments, and other culminating demonstrations of learning.
  • Process criteria are emphasized by educators who believe product criteria do not provide a complete picture of student learning. From this perspective, grading and reporting should reflect not just the final results but also how students got there. Teachers who consider effort or work habits when reporting on student learning are using process criteria. So are teachers who count regular classroom quizzes, homework, class participation, or attendance.
  • Progress criteria, often referred to as improvement scoring, learning gain, or value-added grading, consider how much students have gained from their learning experiences. Teachers who use progress criteria look at how far students have come over a particular period of time, rather than just where they are. As a result, grading criteria may be highly individualized. Most of the research evidence on progress criteria in grading and reporting comes from studies of differentially paced instructional programs and special education programs.

Teachers who base their grading and reporting procedures on learning criteria typically use some combination of these three types. Most also vary the criteria they employ from student to student, taking into account individual circumstances. Although usually done in an effort to be fair, the result is a "hodgepodge grade" that includes elements of achievement, effort, and improvement.

Researchers and measurement specialists generally recommend the use of product criteria exclusively in determining students' grades. They point out that the more process and progress criteria come into play, the more subjective and biased grades are likely to be. If these criteria are included at all, they recommend reporting them separately.

The issues of grading and reporting on student learning continue to challenge educators. However, more is known at the beginning of the twenty-first century than ever before about the complexities involved and how certain practices can influence teaching and learning. To develop grading and reporting practices that provide quality information about student learning requires clear thinking, careful planning, excellent communication skills, and an overriding concern for the well-being of students. Combining these skills with current knowledge on effective practice will surely result in more efficient and more effective grading and reporting practices.

See also: A SSESSMENT, subentry on C LASSROOM A SSESSMENT ; E LEMENTARY E DUCATION, subentries on C URRENT T RENDS , H ISTORY OF ; S ECONDARY E DUCATION, subentries on C URRENT T RENDS , H ISTORY OF ; S OCIAL O RGANIZATION OF S CHOOLS.

BIBLIOGRAPHY

A USTIN , S USAN, and M C C ANN , R ICHARD. 1992. "'Here's Another Arbitrary Grade for Your Collection': A Statewide Study of Grading Policies." Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

B AILEY , J ANE, and M C T IGHE , J AY. 1996. "Reporting Achievement at the Secondary Level: What and How." In Communicating Student Learning. 1996 Yearbook of the Association for Supervision and Curriculum Development, ed. Thomas R. Guskey. Alexandria, VA: Association for Supervision and Curriculum Development.

B LOOM , B ENJAMIN S. ; M ADAUS , G EORGE F.; and H ASTINGS , J. T HOMAS. 1981. Evaluation to Improve Learning. New York: McGraw-Hill.

B RACEY , G ERALD W. 1994. "Grade Inflation?" Phi Delta Kappan 76 (4):328–329.

B ROOKHART , S USAN M. 1991. "Grading Practices and Validity." Educational Measurement: Issues and Practice 10 (1):35–36.

B ROOKHART , S USAN M. 1994. "Teachers' Grading: Practice and Theory. Applied Measurement in Education 7 (4):279–301.

C AMERON , J UDY, and P IERCE , W. D AVID. 1994. "Reinforcement, Reward, and Intrinsic Motivation: A Meta-Analysis." Review of Educational Re-search 64 (3):363–423.

C AMERON , J UDY, and P IERCE , W. D AVID. 1996. "The Debate about Rewards and Intrinsic Motivation: Protests and Accusations Do Not Alter the Results." Review of Educational Research 66 (1):39–51.

C ANGELOSI , J AMES S. 1990. "Grading and Reporting Student Achievement." In Designing Tests for Evaluating Student Achievement. New York: Longman.

C IZEK , G REGORY J. ; F ITZGERALD , S HAWN M.; and R ACHOR , R OBERT E. 1996. "Teachers' Assessment Practices: Preparation, Isolation, and the Kitchen Sink." Educational Assessment 3 (2):159–179.

C ROSS , L AWRENCE H., and F RARY , R OBERT B ARNES. 1996. "Hodgepodge Grading: Endorsed by Students and Teachers Alike." Paper presented at the annual meeting of the National Council on Measurement in Education, New York.

F RARY , R OBERT B ARNES ; C ROSS , L AWRENCE H.; and W EBER , L ARRY J. 1993. "Testing and Grading Practices and Opinions of Secondary Teachers of Academic Subjects: Implications for Instruction in Measurement." Educational Measurement: Issues and Practice 12 (3):23–30.

F RISBIE , D AVID A., and W ALTMAN , K RISTE K. 1992. "Developing a Personal Grading Plan." Educational Measurement: Issues and Practices 11 (3):35–42.

G ERSTEN , R USSELL ; V AUGHN , S HAWN; and B RENGELMAN , S USAN U NOK. 1996. "Grading and Academic Feedback for Special Education Students and Students with Learning Difficulties." In Communicating Student Learning: 1996 Yearbook of the Association for Supervision and Curriculum and Development, ed. Thomas R. Guskey. Alexandria, VA: Association for Supervision and Curriculum Development.

G USKEY , T HOMAS R. 2001. "Helping Standards Make the Grade." Educational Leadership 59 (1):20–27.

G USKEY , T HOMAS R., and B AILEY , J ANE M. 2001. Developing Grading and Reporting Systems for Student Learning. Thousand Oaks, CA: Corwin.

H ALADYNA , T HOMAS M. 1999. A Complete Guide to Student Grading. Boston: Allyn and Bacon.

K IRSCHENBAUM , H OWARD ; S IMON , S IDNEY B.; and N APIER , R ODNEY W. 1971. Wad-Ja-Get? The Grading Game in American Education. New York: Hart.

L AKE , K ATHY, and K AFKA , K ENO. 1996. "Reporting Methods in Grades K–8." In Communicating Student Learning. 1996 Yearbook of the Association for Supervision and Curriculum Development, ed. Thomas R. Guskey. Alexandria, VA: Association for Supervision and Curriculum Development.

M C M ILLAN , J AMES H. 2001. "Secondary Teachers' Classroom Assessment and Grading Practices." Educational Measurement: Issues and Practice 20 (1):20–32.

M C M ILLAN , J AMES H. ; W ORKMAN , D ARYL; and M YRAN , S TEVE. 1999. "Elementary Teachers' Classroom Assessment and Grading Practices." Paper presented at the annual meeting of the American Educational Research Association, Montreal.

M ILLION , J UNE. 1999. "Restaurants, Report Cards, and Reality." NAESP Communicator 22 (8):5–7.

N AVA , F E J OSEFA G., and L OYD , B RENDA H. 1992. "An Investigation of Achievement and Nonachievement Criteria in Elementary and Secondary School Grading." Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

O'D ONNELL , A NGELA, and W OOLFOLK , A NITA E. 1991. "Elementary and Secondary Teachers' Beliefs about Testing and Grading." Paper presented at the annual meeting of the American Psychological Association, San Francisco, CA.

O RNSTEIN , A LLAN C. 1994. "Grading Practices and Policies: An Overview and Some Suggestions." NASSP Bulletin 78 (559):55–64.

S TARCH , D ANIEL, and E LLIOTT , E DWARD C HARLES. 1912. "Reliability of the Grading of High School Work in English." School Review 20:442–457.

S TARCH , D ANIEL, and E LLIOTT , E DWARD C HARLES. 1913. "Reliability of the Grading of High School Work in Mathematics." School Review 21:254–259.

S TIGGINS , R ICHARD J. 2001. "Report Cards." In Student-Involved Classroom Assessment, 3rd edition. Saddle River, NJ: Merrill Prentice Hall.

S TIGGINS , R ICHARD J. ; F RISBIE , D AVID A.; and G RISWOLD , P HILIP A. 1989. "Inside High School Grading Practices: Building a Research Agenda." Educational Measurement: Issues and Practice 8 (2):5–14.

T RUOG , A NTHONY L., and F RIEDMAN , S TEPHEN. J. 1996. "Evaluating High School Teachers' Written Grading Policies from a Measurement Perspective." Paper presented at the annual meeting of the National Council on Measurement in Education, New York.

W OOD , L ESLIE. A. 1994. "An Unintended Impact of One Grading Practice." Urban Education 29 (2):188–201.

T HOMAS R. G USKEY

Over the course of an academic career the average student will be exposed to a variety of grading systems and procedures. Although some of these systems may be qualitative in nature, such as an annual or semiannual written narrative, the vast majority are quantitative and depend upon numerical or alphanumerical metrics. Perhaps the most familiar of these involves the letters "A" through "F," where "A" is usually given a value of 4.0 and is characterized in words as outstanding or excellent and "F" is given a value of 0.0 and is described as unsatisfactory or failing. The grades of A through F are usually derived from some more differentiated quantitative value such as test score, in which the specific nature of the relationship between grade and test score may take a variety of different forms: (e.g., an A is defined by a score of 90% or better or by a value that falls in the top 5–10% of scores independent of absolute value, and so on). Regardless of the specific translation of test performance into letter grade, the point to keep in mind is that the A–F scale defines the most frequent grading system used in higher education over the past half century or more.

Variations in the Grading System

Like all prototypes, the A–F system admits many variations. These often take the form of plusses and minuses, thereby producing a scale having the possibility of fifteen distinct units: A+, A, A–, B+, B … F–. In actual practice, the grade of A+ is scarcely ever used and the same is true for D+ and D–and F+ and F–, thereby yielding a scale of between eight to ten units. Generally speaking, the greater the number of units in the grading system the more precisely does it hope to quantify student performance. What is interesting in this regard are fluctuations in the actual number of units used in different historical eras. Without going too deeply into the relevant historical facts, it is clear that certain historical periods, such as the 1960s, reduced the grading system to two or so units–Pass, No Credit (P/NC)–whereas other periods, such as the 1980s, expanded it to ten, eleven or twelve units.

Variations in the breadth of the grading system would seem to have significant educational implications. At a minimum, these differences may be taken to imply that scales having a large number of units indicate a relative comfort in making precise distinctions, whereas those having fewer units suggest a relative discomfort in making such distinctions. In the case of more differentiated systems, distinctions and rankings are significant, and individual achievement is emphasized; in the case of less differentiated systems, distinctions and rankings are de-emphasized and interstudent competition is minimized. To some degree, it is possible to view fluctuations in American grading systems as reflecting a more general ambivalence the society has in regard to competition and cooperation, between individual recognition and social equity. Educational institutions sometimes emphasize strict evaluation, competition, and individual achievement, whereas at other times they emphasize less precise evaluation, cooperation, and sympathetic understanding for students of all achievement levels.

Another property of grading systems is that individual class grades often are combined to produce an overall metric called the grade point average or GPA. Unlike its constituent values, which usually are carried to only one (or no numerically significant places), the GPA presents a metric of 400 units yielding the possibility that a GPA of 3.00 will locate the student in the category of "good" whereas a value of2.99 will exclude him or her from this category. In the same way, honors, admission to graduate school, preliminary selection for interviews by a desirable company, and so forth, may be defined by a single point difference on the GPA scale (e.g., 3.50 versus3.49 for Phi Beta Kappa, etc.).

Because GPAs are significant in categorizing student performance, a number of evaluations have been made of their reliability and validity. One issue to be addressed here concerns field of study, where it is well documented that classes in the natural sciences and business produce lower overall grades than those in the humanities or social sciences. What this means is that it is unreasonable to equate grade values across disciplines. It also suggests that the GPA is composed of unequal components and that students may be able to secure a higher GPA by a judicious selection of courses.

Although other factors may be mentioned aside from academic discipline (such as SAT level of school, quality and nature of tests, etc.) the conclusion must be that the GPA is a poor measure and should not be used by itself in coming to significant decisions about the quality of student performance or differences between departments and/or educational institutions. The GPA is also a relatively poor basis on which to predict future performance, which perhaps explains why such attempts are never very impressive. In fact, a number of meta-analyses of this relationship, conducted every ten years or so since 1965, reveals that the median correlation between GPA and future performance is 0.18; a value that is neither very useful nor impressive. The strongest relationship between GPA and future achievement is usually found between undergraduate GPA and first-year performance in graduate or professional school.

Despite such difficulties in understanding the exact meanings of grades and the GPA, they remain important social metrics and sometimes yield heated discussions over issues such as grade inflation. Although grade inflation has many different meanings, it usually is defined by an increase in the absolute number of As and Bs over some period of years. The tacit assumption here seems to be that any continuing increase in the overall percentage of "good grades" or in the overall GPA implies a corresponding decline in academic standards. Although historically there have been periods in which the number of good grades decreased (so-called grade deflation), significant social concerns usually only accompany the grade inflation pattern. This one-sided emphasis suggests that grade inflation is as much a sociopolitical issue as an educational one and depends upon the dubious equating of grades with money. What really seems of concern here is a value issue, not a cogent analogy that reveals anything significant about grades or money.

How Grades Are Produced

Grading systems represent just one aspect of an interconnecting network of educational processes, and any attempt to describe grading systems without considering other aspects of this network must necessarily be incomplete. Perhaps the most important of these processes concerns the procedures used to produce grades in the first place, namely, the classroom test. Here, of course, are purely formal differences; for example, between multiple choice and essay tests, or between in-class and take-home tests or papers. Also to be included are the quality of test items themselves not only in terms of content but also in terms of the clarity of the question and, in the case of multiple choice tests, of the distractors.

One way to capture the complexity of possible ways in which grades are produced is to consider the set of implicit choices that lie behind an instructor's use of a specific testing and/or grading procedure. Included here are such questions as: What evaluation procedure should I use? Term papers, classroom discussions, or in-class tests? If I choose tests, what kind(s)? Essay, true/false, fill-in-the-blank, matching, or multiple-choice? If I choose multiple-choice, what grading model should I use? Normal curve, percent-correct, improvement over preceding tests? If I choose percent-correct, how many tests should I give? Final only, two in-class tests and a final, one midterm and one final? How should I weight each test if I choose the midterm-final pattern? Midterm equals final, midterm is equivalent to twice the final exam grade, final equals twice the midterm grade? What grade report system should I use? P/F; A, B, C, D, F; or A+, A, A–, B+, … F? An examination of this collection of possible choices suggests that instructors have a large number of options as to how to go about testing and grading their students.

Any consideration of the ways in which testing and grading relate to one another must also deal with the ways in which one or both of these activities relate to learning and teaching. The relationship between learning and testing is a fairly direct (if neglected) one, especially if tests are used not only to evaluate student achievement but also to reinforce or promote learning itself. Thus it is easy to develop a classroom question or exercise that requires the student to read some material before being able to answer the question or complete the exercise. Teaching, on the other hand, would seem to be somewhat further removed from issues of testing and grading, although the specific testing and grading plan used by the instructor does inform the student as to what constitutes relevant knowledge as well as what attitude he or she holds toward precise evaluation and academic competition.

Students are not immune to testing and grade procedures, and educational researchers have made the distinction between students who are grade oriented and those who are learning oriented. Although this distinction is surely too one-dimensional, it does suggest that for some students the classroom is a place where they experience and enjoy learning for its own sake. For other students, however, the classroom is experienced as a crucible in which they are tested and in which the attainment of a good grade becomes more important than the learning itself. When students are asked how they became grade (or learning) oriented, they usually point to the actions of their teachers in emphasizing grades as a significant indicator of future success; alternatively, they describe instructors who are excited by promoting new learning in their classrooms. When college instructors are asked about the reason(s) for their emphasis on grades, they report that student behaviors–such as arguing over the scoring of a single question–make it necessary for them to maintain strict and well-defined grading standards in their classrooms. The ironic point is that both the student and the instructor see the "other" as emphasizing grades over learning, and neither sees this as a desirable state of affairs. What seems missing in this context is a clear recognition by both the instructor and the student that grades are best construed as a type of communication. When grades (and tests) are thought about in this way, they can be used to improve learning. As it now stands, however, the communicative purpose of grading is ordinarily submerged in their more ordinary use as a means of rating and sorting students for social and institutional purposes not directly tied to learning. Only when grades are integrated into a coherent teaching and learning strategy do they serve the purpose of providing useful and meaningful feedback not only to the larger culture but to the individual student as well.

See also: C OLLEGE T EACHING.

B AIRD , L EONARD L. 1985. "Do Tests and Grades Predict Adult Achievement?" Research in Higher Education 23:3–85.

C URRETON , L OUISE W. 1971. "The History of Grading Practices." Measurement in Education 2:1–9.

D UKE , J. D. 1983. "Disparities in Grading Practice: Some Resulting Inequities and a Proposed New Index of Academic Achievement." Psychological Reports 53:1023–1080.

G OLDMAN , R OY D. ; S CHMIDT , D ONALD , E. ; H EWITT, B ARBARA , N.; and F ISHER , R ONALD. 1974. "Grading Practices in Different Major Fields." American Education Research Journal 11:343–357.

M ILTON , E. O HMER ; P OLLIO , H OWARD R.; and E ISON , J AMES A. 1986. Making Sense of College Grades. San Francisco: Jossey-Bass.

P OLLIO , H OWARD R.; and B ECK , H ALL P. 2000. "When the Tail Wags the Dog: Perceptions of Learning and Grade Orientation in and by Contemporary College Students and Faculty." The Journal of Higher Education 71:84–102.

H OWARD R. P OLLIO

Additional topics

  • Graduate Record Examination - Genesis of the GRE General Test, General Test, Subject Tests, The Writing Assessment Test
  • Government and The Changing Role of Education - Education As a Public Good, Standards and Efficiency, Equity and Accountability, Private Sector Alternatives, Conclusion
  • Grading Systems - Higher Education

Education - Free Encyclopedia Search Engine Education Encyclopedia

TOI logo

  • Education News

WB Board class 12 result date 2024: When and where to check WBCHSE scores, passing marks and more

WB Board class 12 result date 2024: When and where to check WBCHSE scores, passing marks and more

How to check WBCHSE Result 2024?

Step-by-step guide to check result via sms:, west bengal board hs grading system, visual stories.

history of grading system in education

IMAGES

  1. Grading System In Education

    history of grading system in education

  2. US Grading System 2023: Everything You Need To Know

    history of grading system in education

  3. Standards-Based Grading: What To Know for the 2021-2022 School Year

    history of grading system in education

  4. Understanding the Undergraduate Grading System in the UK

    history of grading system in education

  5. Grading System in the UK

    history of grading system in education

  6. GCSE results 2023 grades explained with 1 to 9 equivalent in A* to G

    history of grading system in education

VIDEO

  1. Letter E in the Grading System 🤔

  2. Grading Students History Test

  3. american grading system is peak 🤣

  4. The Grading System In The United Kingdom Is Very Interesting…

  5. Grading System in K-12 Basic Education Program

  6. The grading system in Korea

COMMENTS

  1. What Is the History of Grading?

    Grading, a subset of assessment, focuses on measuring individual student learning. Letter grades, evaluative and numerical grading, and grading on a curve are all relatively new in the realm of education because the concept of grading is a recent practice. Letter grades were not in widespread use until the 1940s.

  2. History of Grading Systems

    Letter grades were first used in the United States in the last part of the 19th century. Both colleges and high schools began replacing other forms of assessment with letter and percentage grades in the early 20th century. While grading systems appear to be fairly standardized in the U.S., debates about grade ...

  3. A Century of Grading Research: Meaning and Value in the Most Common

    Grading research history parallels the history of educational research more generally, with studies becoming both more rigorous and sophisticated over time. 1 This document is a pre-print of this manuscript, published in the journal Review of Educational Research. Citation: Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan,

  4. A Brief History of Grades and Gradeless Learning

    By 1897, the relatively small women's college, Mount Holyoke, combined these ideas into the modern letter grade scale - a 4.0 is an A, 3.0 is a B, so on and so forth. Of course, there is an entire history of arguments and reforms leading up to this point. In 1846, an early adopter of standardization and proponent of public education, Horace ...

  5. Grading in education

    Grading in education is the process of applying standardized measurements for varying levels of achievements in a course. Grades can be assigned as letters (usually A to F), as a range (for example, 1 to 6), as a percentage, or as a number out of a possible total (often out of 100). ... History Yale University ... the grading system comes from ...

  6. Teaching More by Grading Less (or Differently)

    Below, we explore a brief history of grading in higher education in the United States. This is followed by considerations of the potential purposes of grading and insights from research literature that has explored the influence of grading on teaching and learning. ... Importantly, constructing a grading system that rewards students for ...

  7. Making the grade: a history of the A-F marking scheme

    Abstract. This article provides a historical interpretation of one of the defining features of modern schooling: grades. As a central element of schools, grades—their origins, uses and evolution—provide a window into the tensions at the heart of building a national public school system in the United States.

  8. Grading Systems

    GRADING SYSTEMS SCHOOLThomas R. Guskey HIGHER EDUCATIONHoward R. Pollio SCHOOL Few issues have created more controversy among educators than those associated with grading and reporting student learning. Despite the many debates and multitudes of studies, however, prescriptions for best practice remain elusive. Although teachers generally try to develop grading policies that are honest and fair ...

  9. Making Sense of College Grades: Why the Grading System Does Not ...

    history of grading in American higher education to date, improving upon the work of Cureton (1971) and Warren (1971). The history might be faulted as being slanted toward the authors' perspective, explicated below. They cite Dressel's (1983) definition of grading in the first chapter and throughout the book.

  10. PDF Developing and Reconceptualizing an Equitable Grading System in

    support a system of distrust and disregard. Grading in Undergraduate Institutions Grading has a long history of being a way in which teachers and schools measured students against one another. In higher education, grading practices began at Cambridge University in the 16th century, and those practices were later brought to higher education

  11. History of the Letter Grading System

    History of the letter grading system. Formal education systems have been in place for thousands of years—from the earliest examples of China's Xia dynasty schooling that began in 2070 B.C., to the robust, philosophically based education systems used by the ancient Greeks beginning in 500 B.C. While formal education has been around since ...

  12. Making the grade: a history of the A-F marking scheme

    the education system became larger and more complex, the limitations of. ... the 4.0 scale and the 100 percent system" [6]. The history of grading in schools has been a complex one, with roots in ...

  13. Why the 100-Point Grading Scale Is a Stacked Deck

    The end result of that journey—the 100-point grading system in its current permutation—is a "badly lopsided scale that is heavily gamed against the student," say the researchers James Carifio and Theodore Carey, who studied topics like cognitive psychology and assessment at the University of Massachusetts-Lowell. When the original 100-point scale prevailed, grades were centered ...

  14. The History of Grading in the US

    Pre- Colonial United States. The modern education system in the United States has its roots in Western- European schooling values. It's important to acknowledge the education and grading norms of the Native American Nations that existed far before colonialism. Gregory Cajete, a Tewa Native American educator, described education among native ...

  15. PDF History of Grading

    Assessment Programs 5 public education. Additionally, the teachers' motivation of students through their contagious love of a subject was also lost. This new assembly line style of education, made possible with the advent of the "grading system," made its way to the United States in the early 1800's. Hartmann (2000) notes, students

  16. PDF Effective Grading: A Tool for Learning and Assessment

    eliminate the problems with the grading system. If faculty construct grading systems that are conducive to learning they can create and generate information that can be useful for assessment of learning outcomes. The challenge then is to create and select ^assignments and exams that will both teach and test the learning you most care about (16).

  17. Academic grading in the United States

    Below is the grading system found to be most commonly used in United States public high schools, according to the 2009 High School Transcript Study. This is the most used grading system; however, there are some schools that use an edited version of the college system, which means 89.5 or above becomes an A average, 79.5 becomes a B, and so on.

  18. The Origin of Grades in American Schools

    This is grading's original sin, and — despite centuries of change in American schools and colleges — we have never been able to move past it. From the very beginning, grading was more about sorting, signaling, and certification than it was a system for supporting or enhancing students and their learning. Class valedictorians, Latin honors ...

  19. Alternative Grading for College Courses

    Traditional grading systems have a long history in colleges and universities across the United States. The first grading systems were used by Yale, Harvard, and other institutions as early as the 1880s and included bell-curve, 100-point (percentage), 4-point (GPA style), or letter (A-F) grading systems (Schinske and Tanner 2014, Bowen and ...

  20. Grading Principles and Guidelines

    The following exemplar guidelines are offered as suggestions to schools as they implement a proficiency-based leaning system: 1. The primary purpose of the grading system is to clearly, accurately, consistently, and fairly communicate learning progress and achievement to students, families, postsecondary institutions, and prospective employers. 2.

  21. How an alternative grading system is improving student learning

    A small but growing number of chemistry professors are adopting an alternative grading system called specifications—or specs—grading. This approach is based on clearly defined learning ...

  22. The Problem with Grading

    Earn an 89 on a test and your grade is a B+, for example. Believed to date back to 1785, when Yale President Ezra Stiles gave four grades to his seniors (optimi, second optimi, inferiors, and pejores), grades have long been a part of our education system in the United States. In fact, Feldman says, grades have become "the main criteria in ...

  23. WB Madhyamik result 2024 date soon: Result expected in May; passing

    In a new addition, this year the West Bengal Board of Secondary Education (WBBSE) has launched a new online marking system for examiners. The board has directed all head examiners of WB Madhyamik ...

  24. AP SSC 10th Result 2024 out: Check pass percentage, grading system

    The Board of Secondary Education, Andhra Pradesh (BSEAP) has declared the Manabadi AP SSC Class 10 Result today, on April 22, 2024. Students can now access their results from the official website ...

  25. Telangana Inter Results Passing Marks: Check official notice, passing

    This grading system provides a clear framework for assessing student performance in the TS Intermediate Results 2024, ensuring transparency and fairness in evaluation.

  26. Grading Systems

    A Brief History. Grading and reporting are relatively recent phenomena in education. In fact, prior to 1850, grading and reporting were virtually unknown in schools in the United States. ... the point to keep in mind is that the A-F scale defines the most frequent grading system used in higher education over the past half century or more ...

  27. What Are the World Bank and the International Monetary Fund?

    History of the World Bank and the IMF. World War II laid waste to Europe. It destroyed entire cities, created dire food and fuel shortages, and left the financial systems of countries like Germany in disarray. Economic conditions were so dire that German citizens resorted to using cigarettes as money.

  28. Israeli-Palestinian Conflict Timeline

    The UN General Assembly passes Resolution 181 calling for the partition of the Palestinian territories into two states, one Jewish and one Arab. The resolution also envisions an international, UN-run body to administer Jerusalem. The Palestinian territories had been under the military and administrative control of the United Kingdom (known as a mandate) since the 1917 defeat of the Ottoman ...

  29. WB Board class 12 result date 2024: When and where to check WBCHSE

    WB HS result 2024: The West Bengal Council of Higher Secondary Education is expected to declare the West Bengal HS Result 2024 in the first week of May. Students who appeared for the WBCHSE Class ...

  30. South Africa's democracy is turning 30

    Despite substantial education funding, South African students consistently rank among the lowest in global assessments of literacy and numeracy skills.. Out of the 50 countries of a well-respected ...