tips – Test Writing Strategies

Alternative-Response Design: Structure and Advice

Posted on October 12, 2015 by Johnston, Tracy

In the previous post we talked about the pros and cons of the alternative-response (e.g., true-false) types of questions as well as their application to Bloom’s Taxonomy. Next we discuss aspects to consider when writing the questions.

I found this “Simple Guidelines” list helpful and informative.

(Source: http://webs.rtc.edu/ii/Teaching%20Resources/GuidelinesforWritingTest.htm)

Base the item on a single idea.

Write items that test an important idea

Avoid lifting statements right from the textbook.

Make the questions a brief as possible

Write clearly true or clearly false statements. Write them in pairs: one “true” and one “false” version and choose one to keep balance on the test.

Eliminate giveaways:

Keep true and false statements approximately equal in length

Make half the statements true and half false.

Try to avoid such words as “all,” “always,” “never,” “only,” “nothing,” and “alone.” Students know these words usually signify false statements.

Beware of words denoting indefinite degree. The use of words like “more,” “less,” “important,” “unimportant,” “large,” “small,” “recent,” “old,” “tall,” “great,” and so on, can easily lead to ambiguity.

State items positively. Negative statements may be difficult to interpret. This is especially true of statements using the double negative. If a negative word, such as “not” or “never,” is used, be sure to underline or capitalize it.

Beware of detectable answer patterns. Students can pick out patterns such as (TTTTFFFF) which might be designed to make scoring easier.

All of this makes sense to me. At first I objected to “Make half the statements true and half false” but when I thought about it, I wouldn’t do exactly half necessarily but maybe close to half. In fact this source, http://teaching.uncc.edu/learning-resources/articles-books/best-practice/assessment-grading/designing-test-questions, suggests making the ratio more like 60% false to 40% true since students are more likely to guess the answer is true.

I found other points to add to the guidelines list. (Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf)

Two ideas can be included in a true-false statement if the purpose is to show cause and effect.

If a proposition expresses a relationship, such as cause and effect or premise and conclusion, present the correct part of the statement first and vary the truth or falsity of the second part.

When a true-false statement is an opinion, it should be attributed to someone in the statement.

Underlining or circling answers is preferable to having the student write them.

Make use of popular misconceptions/beliefs as false statements.

Write items so that the incorrect response is more plausible or attractive to those without the specialized knowledge being tested.

Avoid the use of unfamiliar vocabulary.

Determine that the questions are appropriately answered by “True” or “False” rather than by some other type of response, such as “Yes” or “No.”

Avoid the tendency to add details in true statements to make them more precise. The answers should not be obvious to students who do not know the material.

Be sure to include directions that tell students how and where to mark their responses.

This same source gives you a nice tip for writing true-false items:

Write a set of true statements that cover the content, then convert approximately half of them to false statements. State the false items positively, avoiding negatives or double negatives.

Most of this discussion has been about True-False questions but the category is really Alternative-Response. Let’s look at the variations available to us.

(Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf)

The True-False-Correction Question
In this variation, true-false statements are presented with a key word or brief phrase that is underlined. It is not enough that a student correctly identify a statement as being false. … the student must also supply the correct word or phrase which, when used to replace the underlined part of the statement, makes the statement a true one.This type of item is more thorough in determining whether students actually know the information that is presented in the false statements.

The teacher decides what word/phrase can be changed in the sentence; if students were instructed only to make the statement a true statement, they would have the liberty of completely rewriting the statement so that the teacher might not be able to determine whether or not the student understood what was wrong with the original statement.

If, however, the underlined word/phrase is one that can be changed to its opposite, it loses the advantage over the simpler true-false question because all the student has to know is that the statement is false and change is to is not.

The Yes-No Variation
The student responds to each item by writing, circling or indicating yes-no rather than true-false. An example follows:

What reasons are given by students for taking evening classes? In the list below, circle Yes if that is one of the reasons given by students for enrolling in evening classes; circle No if that is not a reason given by students.

Yes No They are employed during the day.
Yes No They are working toward a degree.
Yes No They like going to school.
Yes No There are no good television shows to watch.
Yes No Parking is more plentiful at night.

The A-B Variation
The example below shows a question for which the same two answers apply. The answers are categories of content rather than true-false or yes-no.

Indicate whether each type of question below is a selection type or a supply type by circling A if it is a selection , B if it is supply.

Select Supply
A B Multiple Choice
A B True-False
A B Essay
A B Matching
A B Short Answer

In summary, the sources all tend to agree that the best type of Alternative-Response items are those that are unambiguous (“true or false with respect to what?”), concisely written, covering one idea per question, and aimed at more than rote memorization. We should avoid trick questions or questions that test on trivia. And the best tests with A-R items have a lot of questions with a True-to-False ratio of 40:60.

Next test item type: Matching!

The Technical Quality of a Test — Part 2

Posted on March 9, 2015 by Johnston, Tracy

In the previous post (below) we started looking at features that define the technical quality of a test.

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Criteria for establishing the technical quality of a test

Cognitive complexity

The test questions will focus on appropriate intellectual activity ranging from simple recall of facts to problem solving, critical thinking, and reasoning.

Bloom’s Taxonomy*

Content quality

The test questions will permit students to demonstrate their knowledge of challenging and important subject matter. The emphasis of the test should be a reflection of the emphasis of the lecture.

Meaningfulness

The test questions will be worth students’ time and students will recognize and understand their value.

Language appropriateness

The language demands will be clear and appropriate to the assessment tasks and to students. It should reflect the language used in the classroom. Test items should be stated in simple, clear language, free of nonfunctional material and extraneous clues, and free of race, ethnic, and sex bias.

Transfer and generalizability

Successful performance on the test will allow valid generalizations about achievement to be made.

Fairness

Student performance will be measured in a way that does not give advantage to factors irrelevant to school learning: scoring schemes will be similarly equitable.

Basic rules of fairness:

Test questions should reflect the objectives of the unit

Expectations should be clearly known by the students

Each test item should present a clearly formulated task

One item should not aide in answering another

Ample time for test completion should be allowed

Assignment of points should be determined before the test is administered.

Reliability

Answers to test questions will be consistently trusted to represent what students know.

*More on Bloom’s Taxonomy in a future post.

We have already discussed points #1 and 2; let us continue with the rest.

Meaningfulness

I think if we are writing exam questions that explore the knowledge we want the students to learn, the questions will be meaningful, even when they only test simple recall. Each question should trigger a memory in any student who has prepared and studied.

I am not always certain our students will recognize and understand the value of the questions we offer but I am not sure that really matters. We want to avoid outrage at a question that comes across as grossly unfair or outside the scope of the class, which I think will happen with meaningful questions.

Language appropriateness

When I see the phrase “the language used in the classroom,” I think about how I describe concepts and the level of the vocabulary I use in discussions. I try to avoid “dumbing down” the words I use but I also try to avoid choosing words that are esoteric or outdated. In lecture, it is often easy to see student reaction to words they don’t understand, and that tells me I need to define those words, even if they aren’t words in my discipline. This gives me an opportunity to raise the student vocabulary closer to college level. Once I have used and defined them, I feel free to use those words in exams.

One hazard of making the questions “free of nonfunctional material and extraneous clues” in mathematics is that students become trained to believe they must use every number and every bit of information in the problem or they won’t be working it correctly. Unfortunately, real world problems that use math often contain nonfunctional material and extraneous clues and our students need to learn how to weed it out. I introduce this skill at the calculus level.

Transfer and generalizability

The goal I set for my students is for them to learn the course material in such a way that they can perform the skills, recall the ideas, and recognize the vocabulary and notation, and that they are prepared to take the next course in the sequence successfully. This is my definition of transfer and generalizability.

How would you define it for your discipline?

Fairness

This seems straightforward and reasonable to me. I don’t always have the time to determine the assignment of points before the test is administered but I always do before I start grading. If something causes me to rethink the point distribution, I regrade all the problems affected by it.

Reliability

The description given for this point did not help me understand reliability but this source’s definition did [note that “marker” means “the person grading the exam”]:

Source: http://www.lshtm.ac.uk/edu/taughtcourses/writinggoodexamquestions.pdf

Does the question allow markers to grade it consistently and reproducibly and does it allow markers to discriminate between different levels of performance? This frequently depends on the quality of the marking guidance and clarity of the assessment criteria. It may also be improved through providing markers with training and opportunities to learn from more experienced assessors.

What resonates with me is the ability to discriminate between different levels of performance. That can be challenging when grading math problems because I feel partial credit is important. Students can work problems in so many incorrect or partially correct ways that I have to work hard to determine how much they really knew and how much was due to simple error.

From the criteria list I see the opportunity to consider the overall structure of my exams and assess my general test writing skills. I like the guidelines and how they direct me to think beyond my personal experiences while considering how the students will perceive the test.

The Technical Quality of a Test — Part 1

Posted on March 6, 2015 by Johnston, Tracy

We want our tests to be good measures of student achievement so we need to pay attention to what one source calls the “technical quality of a test.”

To help me understand what quality really means, I found these definitions useful:

The characteristics of a product or service that bear on its ability to satisfy stated or implied needs; “conformance to requirements”
A product or service free of deficiencies; “fitness for use”

(from http://asq.org/glossary/q.html)

So what criteria should we use to improve quality?

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf

Criteria for establishing the technical quality of a test

Cognitive complexity

The test questions will focus on appropriate intellectual activity ranging from simple recall of facts to problem solving, critical thinking, and reasoning.

Bloom’s Taxonomy*

Content quality

The test questions will permit students to demonstrate their knowledge of challenging and important subject matter. The emphasis of the test should be a reflection of the emphasis of the lecture.

Meaningfulness

The test questions will be worth students’ time and students will recognize and understand their value.

Language appropriateness

The language demands will be clear and appropriate to the assessment tasks and to students. It should reflect the language used in the classroom. Test items should be stated in simple, clear language, free of nonfunctional material and extraneous clues, and free of race, ethnic, and sex bias.

Transfer and generalizability

Successful performance on the test will allow valid generalizations about achievement to be made.

Fairness

Student performance will be measured in a way that does not give advantage to factors irrelevant to school learning: scoring schemes will be similarly equitable.

Basic rules of fairness:

Test questions should reflect the objectives of the unit

Expectations should be clearly known by the students

Each test item should present a clearly formulated task

One item should not aide in answering another

Ample time for test completion should be allowed

Assignment of points should be determined before the test is administered.

Reliability

Answers to test questions will be consistently trusted to represent what students know.

*More on Bloom’s Taxonomy in a future post.

Let’s examine this list in detail.

Cognitive complexity

Oh, I like this one. We should be challenging our students with intellectual activity; more importantly with a range of it. This brings me back to the previous post (below) where we discussed the classification of questions based on how easily they can be answered; from those that most can get to the few “A-B Breakers”. There should be some questions that make the student think, “Hmmm, how can I use what I have learned to answer this?” and some that bring on the reaction of “Oh yes, I have seen all this before and I can remember it.”

I recall a question on a botany exam that asked me to imagine holding a plant stem in my hand and piercing it with a straight pin. I needed to describe the various tissue types the pin might touch as it passed through to the middle of the stem. I had learned the list of tissues already; this question forced me to consider their locations in the plant and organize them from the outside to the inside. I hadn’t already considered that idea so I was cognitively challenged but I had all the tools I needed to answer the question.

One message that comes across in a number of the sources is that we can be tempted to test on the easier parts of the material rather than the important parts. Considering cognitive complexity helps us focus on drawing from our students what they have learned beyond simple recall.

Content Quality

Again, we need to ensure we are testing more than simple recall but we also have to make sure we are not writing questions that test outside of the course material.

Here is one concern I have about emphasizing on the test what is emphasized in the lecture: if this is taken too literally, our students are at risk of paying attention only to the information we explicitly label as important and ignoring any nuances or “items of lesser importance.” They are often keenly tuned into the way we write on the board, in a PowerPoint slide, or on digital lecture notes and are quick to infer that words in bold, italics, all capitals, or that are underlined are the only things they should study for a test. I found I was giving them that impression in my lectures so I changed the way I wrote on the board, forcing my students to consider all the words I presented.

We will continue this discussion in the next post!

General Tips About Test Design

Posted on February 25, 2015 by Johnston, Tracy

In the previous post (below) we asked, “Do you think about what you are testing and how you are assessing that information?” when it comes to test design.

This site,

Source: http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf,

provides some general tips:

General tips about testing

Length of test

The more items it has, the more reliable it is. However, if a test is too long, the students may get tired and not respond accurately. If a test needs to be lengthy, divide it into sections with different kinds of tasks.

Clear, concise instructions

It is useful to provide an example of a worked problem, which helps the student understand exactly what is necessary.

Mix it up!

It is often advantageous to mix types of items (multiple choice, true-false, essay) on a written exam. Weaknesses connected with one kind of item or component or in students’ test taking skills will be minimized.

Test Early

Consider discounting the first test if the results are poor. Students often need a practice test to understand the format each instructor uses and anticipate the best way to prepare and take particular tests.

Test frequently

Frequent testing helps students to avoid getting behind, provides instructors with multiple sources of information to use in computing the final course grade, and gives students regular feedback.

Check for accuracy

Instructors should be cautious about using tests written by others. They should be checked for accuracy and appropriateness in the given course.

Proofread exams

Check them carefully for misspellings, misnumbering responses, and page collation.

One wrong answer

It is wise to avoid having separate items or tasks depend upon answers or skills required in previous items or tasks.

Special considerations

Anticipate special considerations that learning disabled students or non-native speakers may need.

A little humor

Using a little humor or placing less difficult items or tasks at the beginning of an exam can help reduce test anxiety and thus promote a more accurate demonstration of their progress.

My reaction to their advice is mixed. I’m not sure I could provide good examples of worked problems on the test itself because I teach mathematics — working problems for the students defeats the purpose of the test. However I can have the students get that knowledge before the exam by having them complete homework problems and emphasize that many of the test problems will utilize those skills and strategies.

I am able to “mix it up” sometimes, depending on the course and the material being covered. When testing vocabulary in statistics, for example, sometimes I use multiple choice and sometimes I use fill-in-the-blank.

I am not fond of the idea of discounting the first test if it is poor. I get around that by offering my students short quizzes on a regular basis — I write the problems and grade them so students get a feel for my writing style and notation expectations before the longer, high-stakes exams. My goal for the quizzes is to have the cumulative points be similar to an exam but then that total is weighted less than an exam towards the overall grade.

In math it is difficult to completely avoid having separate items or tasks depend upon previous answers. The dilemma is this: Do I write a complicated problem and have the students recall all the steps I want? Or do I walk them through the steps knowing the answer to one may be dependent on the answer of another?

I think humor is a wonderful addition to tests. Whenever I can (i.e., there is room), I include a math-related cartoon on the last page.

Also, the phrase I have heard about placing less difficult items at the beginning of test is “establishing a pattern of success.” Give the student who has prepared a chance to start off with a victory thus building confidence for the rest of the questions.

What do you think of the list? Would you add to it? Is there anything with which you disagree?

Test Writing Strategies

Test Writing Strategies

Tag: tips

Alternative-Response Design: Structure and Advice

The Technical Quality of a Test — Part 2

The Technical Quality of a Test — Part 1

General Tips About Test Design

Contact Information

Subscribe to Blog via Email

Categories

Recent Posts

Recent Comments

Archives