Home Page      Index


Multiple-Choice Test Scoring Methods


Several scoring methods for multiple-choice tests can be related by one factor: student self-judgment. Only traditional multiple-choice scored by counting right marks fails to value self-judgment. Other scoring methods assign a value of 50% (Knowledge and Judgment Scoring) and 75% (Confidence Based Learning). Self-judgment is a normal part of writing and scoring essays, short-answer tests, projects, and reports.


Confidence Based Learning by Knowledge Factor is a patented instructional system that values judgment far higher than knowledge. This is true mastery. It evaluates skilled use of practiced self-judgment. The examinee has the ability to draw on a web of relationships that permit confirmation that an action or test mark is correct. This is appropriate for high-risk jobs and competitions. The examinee is right or omits. The active starting score is 75% or higher.


Knowledge and Judgment Scoring gives an equal value to knowledge and to judgment. All students can understand a 1:1 ratio. One point for right and one point for omit (good judgment to not make a wrong mark). The active starting score is 50%. Students learn to make sense as they study, to understand, and to create a web of relationships they can use to answer test questions they have not seen before, as well as, use as the basis for further learning.


Formula Scoring is a discredited and dishonest scheme to make a correction for guessing after forcing students to guess. The functional, dynamic, best guess, on average, lucky score for a test is always higher than the static designed value of one out of the number of answer options (this deviation discredits the use of formula scoring). It is dishonest to tell students not to guess.


Traditional Right Mark Scoring is the easiest way to score multiple-choice tests. However this only yields a rank of decreasing value at lower scores. At lower scores, the on average lucky score increases to the point that it has more effect on the test score than the examinee does. At times it teases students with a higher lucky score than on average and at times it cheats them with a lower score. The practice of reassigning the active starting score to zero only hides this fact. Neither formula scoring nor right mark scoring consider the dynamic, functional, best guess, lucky score (the score students get by guessing after deleting answer options they know are not correct).


Item Response Theory (IRT) locates student ability and item difficulty on a single linear scale (linear enough for practical use). The Rasch model is a simple way to do this. The results are needed for computer adaptive testing (CAT). The Rasch model is also used by many state departments of education to develop standardized tests. Winsteps can IRT score the equivalent of traditional right mark scoring and Knowledge and Judgment Scoring (partial credit Rasch model). Because IRT scoring and analysis is becoming increasingly popular, a short form calculation has been added to PUP under Advanced features. This feature was developed as an auditing tool during a two year investigation into the use (and misuse) of IRT analysis in standardized testing.


Power Up Plus is software that combines traditional right mark scoring and knowledge and judgment scoring into a training program that helps students make the transition from negative passive pupil to self-correcting high quality achiever. Examinees have the choice to make the switch from guessing for right answers to reporting what they know and trust when they are comfortable doing so. Experience with 3000 students indicates that over 90% will make the switch after their first two experiences. It is just more satisfying to do a high quality job of reporting what you know than to gamble for a grade.