The Inaccuracy of Tests

I’m taking an online course, Formative Assessment and Standards-Based Grading, from the Marzano Research Laboratory, and it’s reinforcing what I’ve believed about how subjective it is to grade and score kids on their learning. As teachers we strive to assess our students’learning as accurately as possible. The truth is that an A in my class is not the same as an A in someone else’s class. I’m not even certain that a, “Meets this Standard,” in my class is the same as, “Meets this Standard,” in a similar class. Letter grades or standards-based grades are snapshots of a complicated process and change all the time. That being said I do prefer standards-based grading for many reasons as long as the “grading” period doesn’t happen often. I’d much rather give my students feedback most of the time to help them learn than cloud that feedback with letters or numbers that reduce their learning.

Here’s a great example from the formative assessment course I’m taking:

Let’s take this one type of test –
Questions 1-10 are basic, simple recall questions, probably multiple choice or fill in the blank, about content that was explicitly taught.

Questions 11-14 are more complex questions, probably short answer, still about content that was explicitly taught.

Questions 15-16 are complex questions, probably short answer, that ask students to apply what they learned. Content was NOT explicitly taught.

So let’s say the same exact test is given to students in three different classes, a common assessment. I took the test and got all of the first 10 questions correct, I got half of the second set of questions correct and I got none of the last two questions correct.

Take into account this scenario of my test being scored in each of the three different classes:
In class 1 the teacher weighs the first set of questions, 1-10 at 20%, the second set of questions, 11-14 at 20%, and the third set of questions, 15-16 at 40%.

So in class 1 I got 20% + 10% + 0% = 30%, I got an F.

In class 2 the teacher weighs the first set of questions, 1-10 at 60%, the second set of questions, 11-14 at 30%, and the third set of questions, 15-16 at 10%.

So in class 2 with the same test getting the same questions correct I got 60% + 15% + 0% = 75%, I got a C.

In class 3 the teacher weighs the first set of questions, 1-10 at 80%, the second set of questions, 11-14 at 20%, and the third set of questions, 15-16 at 0% because that teacher feels he or she cannot hold students accountable for content he or she did NOT explicitly teach.

So in class 3 I got 80% + 10% = 90%, I got an A-!

How can I take the same test and depending on the class, or specifically depending on the way the questions types are weighted, get anywhere from an F to an A-?!

“There’s measurement error in any test no matter how well we design them, there’s going to be measurement error. Let me give you an example. One of the readability analyses, there’s a lot of different ones you can use to find out what’s the readability level on any kind of a passage. One of those in particular has a 1.5, what would be considered a year and a half measurement error. What if I’m a First Grade teacher, I want that readability to be spot on, don’t I? So if I happen to see that it says 2.0, that’s the readability, 2.0, and let’s pretend that it only has one measurement error, one grade level span at measurement error, that means that 2.0 could be as low as First Grade, could be as high as Third Grade. That’s what we call measurement error. There’s error inherent in every kind of measurement, which is why you don’t want to base big decisions only on a couple of assessments, and certainly not only on just one assessment, you want to use multiple pieces of evidence.”
Marzano Research Laboratory Vice President Dr. Tammy Heflebower, from Formative Assessment and Standards-Based Grading course.

We need to figure out what we want out of our assessments. If it’s to help students learn then feedback is the probably the best way to go. If it’s to report to students and their families what they have learned, or what they’ve shown that they have learned, then using rubrics and student self-assessment with teacher input (standards-based grading) is useful. If you’re stuck having to reduce student learning to letter grades then try having conversations with students as to what their “final grade” should be. Usually what ends up happening here is some elaborate math to convert standards-based grades to letter grades or percentages. It sucks that we have to do that just because that’s the way it’s been done.

Click below to share this post:

The Inaccuracy of Tests

Alfonso Gonzalez

Recent Posts

Blogroll

Class Links

School & District

Older Posts

Disclaimer

Mr. G’s Hobbies

Meta