The Technical Quality of a Test — Part 2

quality habit

In the previous post (below) we started looking at features that define the technical quality of a test.

     Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Criteria for establishing the technical quality of a test

  1. Cognitive complexity

The test questions will focus on appropriate intellectual activity ranging from simple recall of facts to problem solving, critical thinking, and reasoning.

Bloom’s Taxonomy*

  1. Content quality

The test questions will permit students to demonstrate their knowledge of challenging and important subject matter.  The emphasis of the test should be a reflection of the emphasis of the lecture.

  1. Meaningfulness

The test questions will be worth students’ time and students will recognize and understand their value.

  1. Language appropriateness

The language demands will be clear and appropriate to the assessment tasks and to students.  It should reflect the language used in the classroom.  Test items should be stated in simple, clear language, free of nonfunctional material and extraneous clues, and free of race, ethnic, and sex bias.

  1. Transfer and generalizability

Successful performance on the test will allow valid generalizations about achievement to be made.

  1. Fairness

Student performance will be measured in a way that does not give advantage to factors irrelevant to school learning:  scoring schemes will be similarly equitable.

Basic rules of fairness:

  • Test questions should reflect the objectives of the unit
  • Expectations should be clearly known by the students
  • Each test item should present a clearly formulated task
  • One item should not aide in answering another
  • Ample time for test completion should be allowed
  • Assignment of points should be determined before the test is administered.
  1. Reliability

Answers to test questions will be consistently trusted to represent what students know.

 

*More on Bloom’s Taxonomy in a future post.

We have already discussed points #1 and 2; let us continue with the rest.

  1. Meaningfulness

I think if we are writing exam questions that explore the knowledge we want the students to learn, the questions will be meaningful, even when they only test simple recall.  Each question should trigger a memory in any student who has prepared and studied.

I am not always certain our students will recognize and understand the value of the questions we offer but I am not sure that really matters.  We want to avoid outrage at a question that comes across as grossly unfair or outside the scope of the class, which I think will happen with meaningful questions.

  1. Language appropriateness

When I see the phrase “the language used in the classroom,” I think about how I describe concepts and the level of the vocabulary I use in discussions.  I try to avoid “dumbing down” the words I use but I also try to avoid choosing words that are esoteric or outdated.  In lecture, it is often easy to see student reaction to words they don’t understand, and that tells me I need to define those words, even if they aren’t words in my discipline.  This gives me an opportunity to raise the student vocabulary closer to college level.  Once I have used and defined them, I feel free to use those words in exams.

One hazard of making the questions “free of nonfunctional material and extraneous clues” in mathematics is that students become trained to believe they must use every number and every bit of information in the problem or they won’t be working it correctly.  Unfortunately, real world problems that use math often contain nonfunctional material and extraneous clues and our students need to learn how to weed it out.  I introduce this skill at the calculus level.

  1. Transfer and generalizability

The goal I set for my students is for them to learn the course material in such a way that they can perform the skills, recall the ideas, and recognize the vocabulary and notation, and that they are prepared to take the next course in the sequence successfully.  This is my definition of transfer and generalizability.

How would you define it for your discipline?

  1. Fairness

This seems straightforward and reasonable to me.  I don’t always have the time to determine the assignment of points before the test is administered but I always do before I start grading.  If something causes me to rethink the point distribution, I regrade all the problems affected by it.

  1. Reliability

The description given for this point did not help me understand reliability but this source’s definition did [note that “marker” means “the person grading the exam”]:

     Source:  http://www.lshtm.ac.uk/edu/taughtcourses/writinggoodexamquestions.pdf

Does the question allow markers to grade it consistently and reproducibly and does it allow markers to discriminate between different levels of performance? This frequently depends on the quality of the marking guidance and clarity of the assessment criteria. It may also be improved through providing markers with training and opportunities to learn from more experienced assessors.

What resonates with me is the ability to discriminate between different levels of performance.  That can be challenging when grading math problems because I feel partial credit is important.  Students can work problems in so many incorrect or partially correct ways that I have to work hard to determine how much they really knew and how much was due to simple error.

From the criteria list I see the opportunity to consider the overall structure of my exams and assess my general test writing skills.  I like the guidelines and how they direct me to think beyond my personal experiences while considering how the students will perceive the test.