|
Home | Excel Sheet Index | Right-Mark Scoring | PUP Item Analysis |
KJS and PUP Analysis
The responsibility for being right is moved from the teacher to the student with Knowledge and Judgment Scoring (KJS). The student no longer guesses until the teacher says, “Your right.” Instead, the student marks an answer for what he/she believes is right, or acceptable, or omits (uses good judgment to not make a wrong mark). Knowledge and judgment both have value instead of only one point for a right mark.
Right marks are still scored one point, but with KJS, judgment has a value higher than zero. It can be set at the test design chance rate (0.25 point for a four option question), at an honest functional rate (0.33 for most classroom tests), at a rate that entices students to use higher levels of thinking (0.50 to reward students equally for their knowledge and self-judgment), and at a high-risk rate (0.75 or higher for nuclear power, police, and health care workers). At its highest rate, an examinee only reports what he/she knows and can do comfortably, or omits. There are few to no wrong answers.
This discussion continues from the right-mark only scoring results from Break Out Plus (scoring) and Power Up Plus (scoring with item analysis). It includes printouts from Power Up Plus that are different from traditionally scored printouts. The value for good judgment to not mark a wrong answer is set at 0.50 (knowledge and judgment have equal value). This value also makes it easy to follow the math. The data are from the first test in a remedial freshman general biology class with a pass/fail point of 60%.

Student Scores now lists poor judgment (PJ) rather than wrong (WG) on Sheet 4. It is poor judgment to mark an item when the student does not know the correct answer. Good judgment (GJ) to not mark a wrong answer, or to report "don't know" (DK), replaces blank (BK).
The two methods of scoring are combined. Six students with the same test score, 58.3%, are enclosed in the large box above. The bottom two students elected traditional scoring (20 wrong and 28 right, or 58.3% RT). The other four elected Knowledge and Judgment Scoring. They earned a quality range from 60.5% to 70% RT as each did a more accurate job of reporting what they knew. In each case, eight more right answers were marked then wrong by both methods of scoring. This is one justification for combining the two methods of scoring into one score distribution. They are equally fair when the students make the choice of scoring method and the related levels of thinking.
One student did a perfect job of reporting (100% RT quality) to earn a test score of 57.3% using only seven questions. This student knew at the time of the test what he/she knew. Two others with the same score had to mark several more RT to offset their poor judgment marks just as is done in running a seven gated maze. The perfect student made no errors in passing the seven gates. The other two students needed to back track (PJ) 6 and 11 times. Students electing traditional scoring marked two to three times as many wrong marks as those electing KJS.
Many students fail to mark all questions because they run out of time trying to find a “best” answer when they have no idea of which one to mark. Others just give up. Either way, these are “don’t know” responses that signal to them that on the next test, “Omit rather than gamble.” Take responsibility. Mark an honest accurate answer sheet for the best score. Eight of the 24 students elected traditional scoring (individual boxes). One had the highest and one had the lowest score on this test. Three failed to mark all questions. These are scored as omits (good judgment to not mark a wrong answer) in the combined analysis. They would be scored as wrong if traditional, right-mark only, scoring was imposed.
The annual Math-Science Olympiad at NWMSU used Knowledge and Judgment Scoring to select scholarship winners from a test administered to over 300 students. Several students often earned the same test score. These ties were generally broken by selecting the student with the highest quality score (%RT). This is the student with the best judgment. The value of judgment was set to 0.50 as this was the first time most of these students ever had the option of Knowledge and Judgment Scoring. Over 90% of students, in class, voluntarily switched to Knowledge and Judgment Scoring after two tests.
/p>
The tally analysis is now completed on the Mark
Analysis and Question Difficulty, Sheet 5. It shows 22 Expected, 7
Discriminating, 13 Guessing, and 6 Misconception items. When most of the
class elects to answer a question and most are right it is
an Expected item (an expected performance by student and item). When few in the class elect to answer a question and most are
right it is Discriminating. When few in the class elect to answer a
question and most are wrong it is Guessing.
Misconceptions occur when most of the class elects to answer a question and most are wrong. This insight into student and item behavior
is not available in traditionally scored tests that only count right marks. The minimum honest value for judgment (not marking a wrong answer, omit) on this test is estimated to
be between 0.24 and 0.32 based on the ability of the test to fit student
preparation (between 24.0% and 32.2%). The estimate is based on six options, the five item options plus one
for omit. The rule of thumb value for the minimum honest value for judgment is 0.33 for classroom tests. How the class viewed each item is added to the
Student Counseling Mark Matrix, Sheet 3, with the addition of the Expected,
Discriminating, Guessing, and Misconception (EDGM) Analysis. Expected items
tended to be easy. Discriminating items hovered around the average difficulty.
Misconceptions tended to be the most difficult. This information tempers student
counseling. The two students, 12x and 13x, with the lowest score, 56.2%, chose different scoring
methods. In both cases they answered five more questions correctly than they
missed. The choice of scoring method did not provide an advantage. Formula scoring would subtract a
portion of wrong marks from right marks, 26 – (21/3) = 19 RT for a “true” score
of 39.6% for both. Formula scoring, deriving the “true" score, is a flawed attempt to clear away the damage done from forcing students to gamble to get a "complete set" of answers for psychometric purposes. It does not resolve whither
chance or the student’s knowledge directed the right mark. It does not permit an honest report of what is known
and not known based on individual student's self-judgment. Subtracting a portion of the wrong marks from a dishonest score
does not make it an honest score.
Meaningful, useful numbers should be harvested from a test that relate to something of importance. To first force students to share marking the test with chance and then remove what chance, on-average, has marked does not make the numbers more meaningful in the classroom. There is no way to know which right marks were made by the student or by chance before or after revising the score.
Each mark has meaning with Knowledge and Judgment Scoring. Knowledge and Judgment Scoring is not formula scoring. Knowledge and Judgment Scoring may someday replace right mark scoring in standardized testing to reduce the number of wrong marks and false right marks. (Wrong marks are a problem, in the classroom, as there is no way to know if a student really believed the option to be right, a true record of student performance, or the student selected the option by chance or in error.)
Wrong marks are a current problem on the
ACT test.
"About 83 percent of Chicago high school juniors surveyed believed that ACT scores are primarily determined by test-taking skills." The respected ACT invites this perception by scoring at the lowest level of thinking, and, by the average score, for 2007, being below 50% of a perfect score, where chance (rather than test-taking skills) determines the score as much as the students. Students are not taking the test to report what they know, but are playing an academic lottery that produces more wrong than right marks. Knowledge and Judgment Scoring can produce accurate honest scores at this level of achievement, as well as, easy to use student counseling matrixes.
Computerized Adaptive Testing (CAT), such as the National Council Licensure Examination (NCLEX) for nursing and administered by Pearson VUE, yields about half right and half wrong responses in the process of providing an efficient right-mark only test by adjusting the difficulty of items to each student’s performance. Individual student self-judgment does not have to be measured to rank examinees pass/fail. The student self-judgment component is replaced with the average performance of a control group.
The student is still a pawn in a testing game rather than an active participant reporting what is of value, trusted and understood.
The Test Performance Profile is the same as for a
traditionally
scored test. Nine items with a difficulty below 37.5% are labeled
“BAD?” They need to be examined. Adjustments can be made on Sheet 2 and the
test rescored before setting grades. This flexibility makes testing a powerful learning event when students are permitted to work on these items before setting grades. The 3a. Student Counseling Mark Matrix, (sheet 6) with
Mastery/Easy, Unfinished, and Discriminating (MUD) analysis is also the
same
except with the addition of the omit marks and the low number of wrong marks. On this test the students with the
lowest and highest scores got all the discriminating items right. Only two of
the MUD discriminating items were listed as discriminating by the students on
Sheet 3
(EDGM analysis). The two counseling printouts provide information that both students and teachers find meaningful. The self-assessment provides formative assessment for students directly and for teachers to use in class, time permitting. The check for independent marking is a bit different
from traditional scoring. First there are far fewer wrong answers from which to
make critical pairs. Secondly, most of the wrong answers that are available
become candidates for critical pairs. Also omits can become critical pairs. A more sensitive check for independent marking is to combine several tests. Test 1 occupies columns 2 to 51; test 2, 52 to 101; and test 3, 102 to 151, if each test contains 50 questions and all students take each test. Power Up Plus Excel Sheet Index Sheet 1: 1. ANSWER FILE DATA
Sheet 1 Sheet 2: 2. ENTER, EDIT AND SAVE DATA
Sheet 2 Sheet 3: 3. STUDENT COUNSELING MARK MATRIX WITH EXPECTED,
DISCRIMINATING, GUESSING, AND MISCONCEPTION (EDGM) ANALYSIS
Above
RMS
KJS Sheet 4: 4. STUDENT SCORES
Above
RMS
Sheet 5: 5. MARK ANALYSIS and QUESTION DIFFICULTY
Above
RMS
Sheet 6: 3a. STUDENT COUNSELING MARK MATRIX WITH
MASTERY/EASY, UNFINISHED AND DISCRIMINATING (MUD) ANALYSIS
Above
RMS
KJS
Sheet 7: 7. TEST PERFORMANCE PROFILE
Above
RMS
Sheet 8: 6. SELECTED CRITICAL PAIRS
Above
RMS Sheet 9: 6a. SELECTED CRITICAL PAIRS (FILTERED)
Above
RMS 4 November 2008

Formula scoring has some use in psychometrics and standardized testing where only average group scores are of interest. Even here the test instructions generally confuse and cheat students.
This tangled line of reasoning is not
necessary when students are free to mark a right answer or to omit, and to be honestly
rewarded for taking the responsibility to do so. (Formula scoring overlooks that a complete set of answers included omit when multiple-choice was invented to assess animal behavior.)


