The Technical Quality of a Test — Part 2

quality habit

In the previous post (below) we started looking at features that define the technical quality of a test.

     Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Criteria for establishing the technical quality of a test

  1. Cognitive complexity

The test questions will focus on appropriate intellectual activity ranging from simple recall of facts to problem solving, critical thinking, and reasoning.

Bloom’s Taxonomy*

  1. Content quality

The test questions will permit students to demonstrate their knowledge of challenging and important subject matter.  The emphasis of the test should be a reflection of the emphasis of the lecture.

  1. Meaningfulness

The test questions will be worth students’ time and students will recognize and understand their value.

  1. Language appropriateness

The language demands will be clear and appropriate to the assessment tasks and to students.  It should reflect the language used in the classroom.  Test items should be stated in simple, clear language, free of nonfunctional material and extraneous clues, and free of race, ethnic, and sex bias.

  1. Transfer and generalizability

Successful performance on the test will allow valid generalizations about achievement to be made.

  1. Fairness

Student performance will be measured in a way that does not give advantage to factors irrelevant to school learning:  scoring schemes will be similarly equitable.

Basic rules of fairness:

  • Test questions should reflect the objectives of the unit
  • Expectations should be clearly known by the students
  • Each test item should present a clearly formulated task
  • One item should not aide in answering another
  • Ample time for test completion should be allowed
  • Assignment of points should be determined before the test is administered.
  1. Reliability

Answers to test questions will be consistently trusted to represent what students know.

 

*More on Bloom’s Taxonomy in a future post.

We have already discussed points #1 and 2; let us continue with the rest.

  1. Meaningfulness

I think if we are writing exam questions that explore the knowledge we want the students to learn, the questions will be meaningful, even when they only test simple recall.  Each question should trigger a memory in any student who has prepared and studied.

I am not always certain our students will recognize and understand the value of the questions we offer but I am not sure that really matters.  We want to avoid outrage at a question that comes across as grossly unfair or outside the scope of the class, which I think will happen with meaningful questions.

  1. Language appropriateness

When I see the phrase “the language used in the classroom,” I think about how I describe concepts and the level of the vocabulary I use in discussions.  I try to avoid “dumbing down” the words I use but I also try to avoid choosing words that are esoteric or outdated.  In lecture, it is often easy to see student reaction to words they don’t understand, and that tells me I need to define those words, even if they aren’t words in my discipline.  This gives me an opportunity to raise the student vocabulary closer to college level.  Once I have used and defined them, I feel free to use those words in exams.

One hazard of making the questions “free of nonfunctional material and extraneous clues” in mathematics is that students become trained to believe they must use every number and every bit of information in the problem or they won’t be working it correctly.  Unfortunately, real world problems that use math often contain nonfunctional material and extraneous clues and our students need to learn how to weed it out.  I introduce this skill at the calculus level.

  1. Transfer and generalizability

The goal I set for my students is for them to learn the course material in such a way that they can perform the skills, recall the ideas, and recognize the vocabulary and notation, and that they are prepared to take the next course in the sequence successfully.  This is my definition of transfer and generalizability.

How would you define it for your discipline?

  1. Fairness

This seems straightforward and reasonable to me.  I don’t always have the time to determine the assignment of points before the test is administered but I always do before I start grading.  If something causes me to rethink the point distribution, I regrade all the problems affected by it.

  1. Reliability

The description given for this point did not help me understand reliability but this source’s definition did [note that “marker” means “the person grading the exam”]:

     Source:  http://www.lshtm.ac.uk/edu/taughtcourses/writinggoodexamquestions.pdf

Does the question allow markers to grade it consistently and reproducibly and does it allow markers to discriminate between different levels of performance? This frequently depends on the quality of the marking guidance and clarity of the assessment criteria. It may also be improved through providing markers with training and opportunities to learn from more experienced assessors.

What resonates with me is the ability to discriminate between different levels of performance.  That can be challenging when grading math problems because I feel partial credit is important.  Students can work problems in so many incorrect or partially correct ways that I have to work hard to determine how much they really knew and how much was due to simple error.

From the criteria list I see the opportunity to consider the overall structure of my exams and assess my general test writing skills.  I like the guidelines and how they direct me to think beyond my personal experiences while considering how the students will perceive the test.

The Technical Quality of a Test — Part 1

quality habit

We want our tests to be good measures of student achievement so we need to pay attention to what one source calls the “technical quality of a test.”

To help me understand what quality really means, I found these definitions useful:

  1. The characteristics of a product or service that bear on its ability to satisfy stated or implied needs; “conformance to requirements”
  2. A product or service free of deficiencies; “fitness for use”

(from http://asq.org/glossary/q.html)

So what criteria should we use to improve quality?

     Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Criteria for establishing the technical quality of a test

  1. Cognitive complexity

The test questions will focus on appropriate intellectual activity ranging from simple recall of facts to problem solving, critical thinking, and reasoning.

Bloom’s Taxonomy*

  1. Content quality

The test questions will permit students to demonstrate their knowledge of challenging and important subject matter.  The emphasis of the test should be a reflection of the emphasis of the lecture.

  1. Meaningfulness

The test questions will be worth students’ time and students will recognize and understand their value.

  1. Language appropriateness

The language demands will be clear and appropriate to the assessment tasks and to students.  It should reflect the language used in the classroom.  Test items should be stated in simple, clear language, free of nonfunctional material and extraneous clues, and free of race, ethnic, and sex bias.

  1. Transfer and generalizability

Successful performance on the test will allow valid generalizations about achievement to be made.

  1. Fairness

Student performance will be measured in a way that does not give advantage to factors irrelevant to school learning:  scoring schemes will be similarly equitable.

Basic rules of fairness:

  • Test questions should reflect the objectives of the unit
  • Expectations should be clearly known by the students
  • Each test item should present a clearly formulated task
  • One item should not aide in answering another
  • Ample time for test completion should be allowed
  • Assignment of points should be determined before the test is administered.
  1. Reliability

Answers to test questions will be consistently trusted to represent what students know.

 

*More on Bloom’s Taxonomy in a future post.

Let’s examine this list in detail.

  1. Cognitive complexity

Oh, I like this one.  We should be challenging our students with intellectual activity; more importantly with a range of it.  This brings me back to the previous post (below) where we discussed the classification of questions based on how easily they can be answered; from those that most can get to the few “A-B Breakers”.  There should be some questions that make the student think, “Hmmm, how can I use what I have learned to answer this?” and some that bring on the reaction of “Oh yes, I have seen all this before and I can remember it.”

I recall a question on a botany exam that asked me to imagine holding a plant stem in my hand and piercing it with a straight pin.  I needed to describe the various tissue types the pin might touch as it passed through to the middle of the stem.  I had learned the list of tissues already; this question forced me to consider their locations in the plant and organize them from the outside to the inside.  I hadn’t already considered that idea so I was cognitively challenged but I had all the tools I needed to answer the question.

One message that comes across in a number of the sources is that we can be tempted to test on the easier parts of the material rather than the important parts.  Considering cognitive complexity helps us focus on drawing from our students what they have learned beyond simple recall.

  1. Content Quality

Again, we need to ensure we are testing more than simple recall but we also have to make sure we are not writing questions that test outside of the course material.

Here is one concern I have about emphasizing on the test what is emphasized in the lecture: if this is taken too literally, our students are at risk of paying attention only to the information we explicitly label as important and ignoring any nuances or “items of lesser importance.”  They are often keenly tuned into the way we write on the board, in a PowerPoint slide, or on digital lecture notes and are quick to infer that words in bold, italics, all capitals, or that are underlined are the only things they should study for a test.  I found I was giving them that impression in my lectures so I changed the way I wrote on the board, forcing my students to consider all the words I presented.

We will continue this discussion in the next post!

Test Goals — Thinking About the Strategies

questioning_mind_bronze_stand

When considering “What am I testing?”, I agree with this web page’s statement:

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

In general, test items should

Assess achievement of instructional objectives

Measure important aspects of the subject (concepts and

conceptual relations)

Accurately reflect the emphasis placed on important
aspects of instruction

Measure an appropriate level of student knowledge

Vary in levels of difficulty

But my reaction is to laugh because it is easy to make this list.  What do you have to do to implement it?

In beginning to construct a test, I should at least acknowledge the instructional objectives.  I might even make a written list, depending on the time I have available.  I ask myself:

  • What do I want the students to get out of each section and chapter?
  • What are the big-picture goals, the skills, the vocabulary, the concepts?
  • What are the common mistakes previous students have made?
  • Is there any information I want to foreshadow?
  • What have I brought to their attention in lecture of what to do or what not to do?

The words, “measure important aspects of the subject,” make me wary.  I think it is easy to interpret them as “only focus on the most important aspects” which means I should not test at all on anything else.

What I do think it means is “avoid testing on minor facts”, which opens the field up to a great many topics as well as encourages us to think deeply about what is and is not important.  For example, in a history class you might learn that Lincoln was assassinated on April 14, 1865.  Is it important that you know it was April 14?  Maybe not.  But April 14 of this year is the 150th anniversary of that event and that might make it important.  In any other year it might be enough to know it happened in 1865.

I find it challenging to determine “an appropriate level of student knowledge.”  This really deals with how long the test should be compared to how much time I have to give it and how well I feel the students are learning the material.  I measure how long it takes me to write up the solutions and divide that into the test time, and am happy if the answer achieves certain values depending on the class.  This generally works well although sometimes I am surprised by the students’ reaction. How do you determine the length?

I address the variation in difficulty by thinking about which questions are easily answered, which take a “usual” amount of work and thinking, and then I throw in one or two “A-B breakers.”  These are the questions that will separate the students who have really learned the material from those who are somewhat prepared or not prepared at all.  They might require a little more conceptual thinking or a slight stretch on the skills or knowledge students should have already attained.

The strategies I use are helpful for writing a test in my discipline.  Other strategies might be appropriate for other disciplines.  Feel free to write about them in the comments section.

General Tips About Test Design

Test. Keyboard

In the previous post (below) we asked, “Do you think about what you are testing and how you are assessing that information?” when it comes to test design.

This site,

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf,

provides some general tips:

General tips about testing

  • Length of test

The more items it has, the more reliable it is.  However, if a test is too long, the students may get tired and not respond accurately.  If a test needs to be lengthy, divide it into sections with different kinds of tasks.

  • Clear, concise instructions

It is useful to provide an example of a worked problem, which helps the student understand exactly what is necessary.

  • Mix it up!

It is often advantageous to mix types of items (multiple choice, true-false, essay) on a written exam.  Weaknesses connected with one kind of item or component or in students’ test taking skills will be minimized.

  • Test Early

Consider discounting the first test if the results are poor.  Students often need a practice test to understand the format each instructor uses and anticipate the best way to prepare and take particular tests.

  • Test frequently

Frequent testing helps students to avoid getting behind, provides instructors with multiple sources of information to use in computing the final course grade, and gives students regular feedback.

  • Check for accuracy

Instructors should be cautious about using tests written by others.  They should be checked for accuracy and appropriateness in the given course.

  • Proofread exams

Check them carefully for misspellings, misnumbering responses, and page collation.

  • One wrong answer

It is wise to avoid having separate items or tasks depend upon answers or skills required in previous items or tasks.

  • Special considerations

Anticipate special considerations that learning disabled students or non-native speakers may need.

  • A little humor

Using a little humor or placing less difficult items or tasks at the beginning of an exam can help reduce test anxiety and thus promote a more accurate demonstration of their progress.

My reaction to their advice is mixed.  I’m not sure I could provide good examples of worked problems on the test itself because I teach mathematics — working problems for the students defeats the purpose of the test.  However I can have the students get that knowledge before the exam by having them complete homework problems and emphasize that many of the test problems will utilize those skills and strategies.

I am able to “mix it up” sometimes, depending on the course and the material being covered.  When testing vocabulary in statistics, for example, sometimes I use multiple choice and sometimes I use fill-in-the-blank.

I am not fond of the idea of discounting the first test if it is poor.  I get around that by offering my students short quizzes on a regular basis — I write the problems and grade them so students get a feel for my writing style and notation expectations before the longer, high-stakes exams.  My goal for the quizzes is to have the cumulative points be similar to an exam but then that total is weighted less than an exam towards the overall grade.

In math it is difficult to completely avoid having separate items or tasks depend upon previous answers.  The dilemma is this:  Do I write a complicated problem and have the students recall all the steps I want?  Or do I walk them through the steps knowing the answer to one may be dependent on the answer of another?

I think humor is a wonderful addition to tests.  Whenever I can (i.e., there is room), I include a math-related cartoon on the last page.

Also, the phrase I have heard about placing less difficult items at the beginning of test is “establishing a pattern of success.”  Give the student who has prepared a chance to start off with a victory thus building confidence for the rest of the questions.

What do you think of the list?  Would you add to it?  Is there anything with which you disagree?

The Challenges to Writing a Good Test

Paying attention to the details

What sorts of challenges do we, as professors, face when writing an exam?  That was one question in my mind when I started reading the resources.  This site made a statement that really struck a chord with me:

Source: https://www.psychologytoday.com/blog/thinking-about-kids/201212/how-write-final-exam

In reality, most professors develop exams as best they can.

Few have any formal training in assessment (the field that focuses on how to accurately measure performance).

Although many professors spend most of their time teaching, most of us have no formal training in education whatsoever.

So we tend to write questions that sound good and make sense to us.

We try to minimize cheating by writing new exams every semester so we never have a chance to weed out bad questions and develop really good measurement instruments.

We often use the same types of tools used by our own professors to assess the skills and learning of our students instead of thinking about what would work best.

We often don’t think clearly enough about our course goals to accurately measure them.

And sometimes our questions are not clear enough so different students interpret them differently and we only recognize interpretations that match our own.

And all this happens despite our best efforts and all our hard work.

For better or for worse.

Some of us have an education background but many of us do not.  We mimic the test styles we liked in our experience and perhaps avoid the ones we disliked.  Certainly we formed opinions about our teachers and decided which we wanted to emulate when we taught.

I don’t see anything wrong with that but if we want to improve, we need to explore new ideas.

We can start by acknowledging the basic features of a good exam.

A test should be an accurate measure of student achievement.

Source: http://www.iub.edu/~best/pdf_docs/better_tests.pdf :

Problems that keep tests from being accurate measures of students’ achievement:

  • Too many questions measuring only knowledge of facts.
  • Too little feedback.
  • The questions are often ambiguous and unclear.
  • The tests are too short to provide an adequate sample of the body of content to be covered.
  • The number of exams is insufficient to provide a good sample of the students’ attainment of the knowledge and skills the course is trying to develop.

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf :

Well-constructed tests:

  1. Motivate students and reinforce learning
  2. Enable teachers to assess the students’ mastery of course objectives
  3. Provide feedback on teaching, often showing what was or was not communicated clearly

What makes a test good or bad?  The most basic and obvious answer is that good tests measure what you want to measure and bad tests do not.

The whole point of testing is to encourage learning.  A good test is designed with items that are not easily guessed without proper studying.

Have you ever spent time studying your tests?  When designing them, do you think about what you are testing and how you are assessing that information?

Have you analyzed the responses students put on the test to see if they understood what you were asking?  Do you think about how the wording or design could be improved on future tests?

We will explore these in more detail in the next posts.

List of Resources

Here are the various web links that we will reference for this blog.  The list may be edited over time.

Feel free to post more resources in the Comments section.

  1. http://www.iub.edu/~best/pdf_docs/better_tests.pdf
  2. http://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/
  3. http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
  4. https://www.msu.edu/dept/soweb/writitem.html
  5. http://www.helpteaching.com/about/how_to_write_good_test_questions/
  6. http://www.psychologytoday.com/blog/thinking-about-kids/201212/how-write-final-exam
  7. http://www.instructables.com/id/How-to-write-a-test/
  8. http://www.uleth.ca/edu/runte/tests/
  9. http://teachonline.asu.edu/2013/06/quick-reference-guide-for-writing-effective-test-questions/
  10. http://www.lshtm.ac.uk/edu/taughtcourses/writinggoodexamquestions.pdf
  11. http://www.crlt.umich.edu/P8_0
  12. http://www.cmu.edu/teaching/assessment/assesslearning/creatingexams.html
  13. http://www.teaching-learning.utas.edu.au/assessment/authentic-assessment/designing-exams
  14. http://depts.washington.edu/eproject/ExamChecklist.htm
  15. http://www.princeton.edu/mcgraw/library/sat-tipsheets/designing-exam/
  16. http://www4.ncsu.edu/unity/lockers/users/f/felder/public/Papers/TestingTips.htm
  17. http://www.tltc.ttu.edu/teach/TLTC%20Teaching%20Resources/CreateTests.asp
  18. http://www.edutopia.org/better-tests-differentiate
  19. http://teaching.uncc.edu/learning-resources/articles-books/best-practice/assessment-grading/designing-test-questions
  20. http://www.calm.hw.ac.uk/GeneralAuthoring/031112-goodpracticeguide-hw.pdf

What We are About

Welcome!

We are teachers and we want to do our best for our students.  Sometimes we need a chance to see what others are doing to help us “improve our game.”

The goal of this blog is to explore the strategies, philosophies, and various options of test writing.  We’ll take a systematic approach, starting with general tips about tests and test construction and then proceeding through different test item types.

We will look at articles and advice on the Internet and discuss how the ideas may or may not apply to our discipline.  This is not a “one-size-fits-all” topic!  Neither should it be considered a best practices list.  We are the topic experts and the best judges for the information we are assessing.

Feel free to share these posts.

Due to the number of spammers, comments have been disallowed.

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license

Tracy Johnston
STEM 1 Curriculum and Program Improvement (CPI) Coordinator
Palomar College

This is a sticky post; newer posts appear below.

For a list of the resources on this blog, click here.