Home | How Multiple-Choice Tests Work | The Value of Judgment | Scoring Schemes

 

Grading Schemes

 

The cure for the problems with traditional multiple-choice scores and grades is not to use less reliable and more costly alternative tests. The cure is within the test itself. Score right (know), wrong (poor judgment) and omit (good judgment to not mark a wrong answer). The boring traditional (guess) test, that yields only a score by counting right marks, comes alive requiring all levels of thinking, with one score for knowledge and another for judgment. The results are comparable to those from essay tests. “You have a test score of 70% with a quality score of 100%. No wrong answers.” Both essay and knowledge and judgment scoring (KJS) can score quantity and quality at higher levels of thinking. Grades are then based on the results of these scoring schemes.

 

The school uniform grading standard is an administrative convenience. All students scoring at 70%, for example, receive a grade of C (or C- in an expanded version). These standards also imply a degree of fairness and of equal quality that is not valid. Often, as shown in the following, the schools lose track of how the standard was developed. It is just a tradition, as is scoring multiple-choice by just counting the right marks. Students, teachers and schools need to break out of these traditions based on old, time consuming, manual scoring, and out of date analysis, methods.

 

Scores and grades are important when they accurately indicate something in the real world. The following four schools use different scoring and grading schemes. There is no way to determine from these schemes if one school has a higher or lower standard than another. Student quality, question difficulty, and the level of thinking required can differ, The pass rate on a standardized test (NCLB or NCLEX, for example) would be helpful but not definitive since class sizes of less than 35 students are the rule. It takes about 300 students to produce stable data.

 

  1. Count Right Answers for Straight Percentage Scoring and Grading

 

School One uses the raw scores of 90, 80, 70, and 60 for the grades A, B, C, and D. This is a common method for scoring and grading in elementary and secondary schools as well as in universities. It is familiar. It is simple. Count the right marks. It is easy for students to understand. It presents little risk to mastery students at the cutoffs for A and B. Students at the 60% pass/fail line are at much greater risk from the increased gambling they are forced to do to “complete” the test. Roughly half the students within +/- 5% will pass or fail because of chance. There is no way to tell which right answers are from chance or from knowing the answer. The number of options per question is also ignored.

 

  1. Count Right Answers for Scoring and Reduced Grade Range Grading

 

School Two applies the raw scores to an 8.5 point grade range, rather than the 10 point grade range above. The rational for the 8.5 point grade range was not known.

Right mark scoring with a liberal pass/fail point, on the other hand, permits students to pass using the lowest levels of thinking. There is a positive and a negative result. At-risk, struggling, and very bright students functioning at the lowest levels of thinking can stay with their class until they either "catch on" or drop out. When students do "come alive" they are not far behind their classmates.

 

  1. Count Right Answers for Scoring and Standard Deviation (SD) Grading

 

School Three uses the normal curve as part of its method of turning scores into grades. Grades based on the normal curve (properly called the normal curve of error) can be derived several ways. The assumption is made that scores must fit the normal curve of error (the know-nothing-curve that can be obtained by tossing coins). The point the curve changes from concave to convex is defined as one standard deviation from the mean. It happens that this range from minus to plus one standard deviation includes 68% of a sample, on average. These two standard deviations are assigned the grade of C. The other letter grades are assigned a range of one standard deviation each above and below C.

 

The desired distribution can be obtained by the proper selection of questions for a test.  Some schools distribute the grades by a scheme based on the normal curve of error directly by ranking student scores with some variation of 3%, 13%, 68%, 13%, and 3% for A, B, C, D, and F, rather than setting grade ranges. Every test has the same portion of letter grades regardless of the actual scores. Only in education are measurements (scores) manipulated to fit a theoretically “known” distribution. Scientific work plots measurements to reveal an unknown distribution from the scores. Curve fitting removes any pretense of quality.

 

  1. Count Right and Wrong Answers for Formula Scoring

 

School Three also uses formula scoring before applying the “normal” curve. The idea behind formula scoring is that part of the right answers are from chance. The number of right answers needs to be corrected. On a five-option item test it follows that for every four wrong answers there must be one false right answer by chance. This gives omit the value of 0.2 per item. Whether you guess and have the score corrected or you omit, the idea is you get the same score. There are two serious errors in this line of thinking.

 

Formula scoring is operationally faulty as a student can get a higher score by guessing from a reduced set of answer options than the penalty assessed for guessing based on all options. Most classroom tests average three functional options per question. A student then guesses one out of three rather than the one out of four or five as the test is designed.

 

Secondly, there are a number of ways students can obtain the same raw score.  The score of 70% can come from answering 70 right and zero wrong (omit 30) out of 100 questions, up to answering all 100 with 75 right and 25 wrong. The first student is 100% right in answering questions, the second is only 75% right. The first student knows what she knew when she took the test. The second student did not and neither the teacher nor the student can sort out knowledge from chance. The two students function at two different levels of thinking. Formula scoring takes little advantage of this quality indicator. It only reduces the score for the student functioning at lower quality.

 

School Three “corrects” the raw score of 75% to 70% (omit = 0.2 for five-option questions). The 70% is then further subjected to an eight-point grade range of two SDs (68% to 84% for a C grade) rather than to a straight eight-point grade range (76% to 84%).  Formula scoring reduced the value of the raw score but using the range of two SDs expanded the grade range beyond that of a ten-point grade range (70% to 80%). First, the student is penalized and then forgiven, if the score is near the class average.

It makes no difference whether you adjust the test score to fit a grade scale or adjust the grade scale to fit raw scores to maintain "high standards". The final grade could have been assigned using the appropriate single value for omit (for whatever line of reasoning) or by selecting a grade range. Here too, how this complicated scheme was devised is unknown to those currently using it.

 

  1. Count right, wrong and omit for knowledge and judgment scoring (KJS)

 

Counting only right answers works well to obtain a ranking or to sort mastery examinees from the rest of the group. The numbers capture only a part of the reality they are represented as measuring. At best they tell us how a student did on a test. Measurig what the student actually knew or could do requires scoring all the information available in a multiple-choice test.

 

At School Four, Northwest Missouri State University, Biology Department, numbers were obtained that tell what a student knows and how much the student and teacher can trust what he knows, quantity and quality. Both knowledge and judgment were measured in classes and in the Biology portion of the Science Olympiad that awarded scholarships to winners. The value of right (know) was set at 0.5 and of omit (don’t know or good judgment to not to mark a wrong answer) at 0.5 resulting in a starting score of 50 on a 100 item test. Knowledge and judgment had equal value (the number of options per question was ignored). This made every question two questions. One to decide if something was known (requiring higher levels of thinking) and the other to select a correct answer (requiring all levels of thinking). Students were rewarded for both knowing and for judgment (to what extent what they know can be trusted).

 

Later research indicated that the average number of options marked is a fair indicator of the minimum value omit should be given for judgment to not mark a wrong answer. As a rule of thumb this is 33% or the same as three options per question in teacher made classroom tests. (NCLB standardized tests designed for mastery have even lower values. High quality tests designed to rank students and schools function close to their design values of four or five options.) The value for omit for KJS must be an honest value. It must always be higher than the test designed chance value used in formula scoring. It is calculated for every test by Break Out Source Code, Break Out Plus and Power Up Plus.

 

KJS extracts the complete set of information from multiple-choice tests. It rewards the sense of responsibility needed to learn and test at higher levels of thinking. Passive pupils learn to become active self-correcting learners. The teacher and student both know what is known and what is not (without question if the quality score for judgment is 100%, and in general, as the number of wrong marks is much reduced with the exception of the student who continues to mark every question). Students only need to mark the questions they know. Gambling is not required. Omit is now rewarded for having the good judgment to not mark a wrong answer rather than valued as zero.

 

All students start with a score of 50% for having not marked any wrong answers (if omit is set to 0.50). A value of 0.5 is added for each right answer and subtracted for each wrong answer (poor judgment). (Or start with zero and add one point for not marking a wrong answer and two points for each right answer to put a positive spin on the instructionsfor students functioning at lower levels of thinking.) The resulting distribution of scores can be assigned grades by any uniform grade range or “normal” curve method. A test score of 75 can be equated to a grade of C with a grade range of 10 points. A test score of 75 would also equate to a score of C with a grade range of eight points using standard deviation scoring (the same score as assigned by the School Three method).

 

The test score of 75 can be obtained in a number of ways. A student may mark all questions (75 right, 25 wrong). A student may mark no wrong marks (50 right, 50 omit, a rare event). A student may mark, using higher levels of thinking, any of several hundred combinations of questions to reflect what has been mastered. It is in this freedom to select what to report that makes KJS as fair as an essay test where a student also has the choice of what to write. The score represents each student's preparation based on each student's self-judgment: a fair test, an individualized test. This score has a different meaning than one from how a student marked a set of questions selected by someone else. KJS makes a better formative assessment than right-mark scoring.

 

The above discussion covers three views of multiple choice test scores and grades. The first ignores any problem with multiple choice testing, just count the right marks. The second attempts faulty or unnecessary corrections. The third, KJS, makes positive use of all the features and benefits of multiple-choice scores and grades. And yet all of these methods are valid for ranking students for grades if the students understand them. Only KJS provides the full measure for student counseling, lecture modification and question revision because students are permitted to report and are rewarded for reporting what they know rather than marking and waiting for the teacher to tell them the right answers.

 

  1. High-Risk Scoring.

 

Knowledge Factor patented the range of omit values from about 75% to 90% in 2005. It is part of their Confidence-Based Learning system. The company tests nuclear power plant operators, police and medical personnel. With omit set this high, an examinee will report what is known with a right mark or omit. There are no (or very, very few) wrong answers marked. One wrong turn of a switch could result in death or large scale destruction. The value for omit is now related to risk on the job rather than to judgment in the classroom. This fourth view of multiple choice test scoring and grading completes the full range for omit from zero in right-mark only scored tests to 90% in high risk assessments.

 

Omit takes on three different roles from chance, to judgment, to high-risk. Chance is the leading player when omit is set to zero. At 33% to 50%, judgment to not mark a wrong answer, the accurate honest scores are compatible with traditional right mark scoring. If a student marks all questions on a KJS test, the scoring is the same as a right mark scored test. At high-risk scoring there are really only two grades: mastery or failing.

 

 

References

Goertzel, Ted. The Myth of the Bell Curve (and 27 references). http://www.crab.rutgers.edu/~goertzel/normalcurve.htm

Knowledge Factor. http://www.knowledgefactor.com