|
Home | Excel Sheet Index | Right-Mark Scoring | PUP Item Analysis |
KJS and PUP Analysis
The responsibility for being right is moved from the teacher to the student with Knowledge and Judgment Scoring (KJS). The student no longer guesses until the teacher says, “Your right.” Instead, the student marks an answer for what he/she believes is right, or acceptable, or omits (uses good judgment to not make a wrong mark). Knowledge and judgment both have value instead of only one point for a right mark.
Right marks are still scored one point, but with KJS, judgment has a value higher than zero. It can be set at the test design chance rate (0.25 point for a four option question), at an honest functional rate (0.33 for most classroom tests), at a rate that entices students to use higher levels of thinking (0.50 to reward students equally for their knowledge and self-judgment), and at a high-risk rate (0.75 or higher for nuclear power, police, and health care workers). At its highest rate, an examinee only reports what he/she knows and can do comfortably, or omits. There are few to no wrong answers.
This discussion continues from the right-mark only scoring results from Break Out Plus (scoring) and Power Up Plus (scoring with item analysis). It includes printouts from Power Up Plus that are different from traditionally scored printouts. The value for good judgment to not mark a wrong answer is set at 0.50 (knowledge and judgment have equal value). This value also makes it easy to follow the math. The data are from the first test in a remedial freshman general biology class with a pass/fail point of 60%.

Student Scores now lists poor judgment (PJ) rather than wrong (WG) on Sheet 4. It is poor judgment to mark an item when the student does not know the correct answer. Good judgment (GJ) to not mark a wrong answer, or to report "don't know" (DK), replaces blank (BK).
The two methods of scoring are combined. Six students with the same test score, 58.3%, are enclosed in the large box above. The bottom two students elected traditional scoring (20 wrong and 28 right, or 58.3% RT). The other four elected Knowledge and Judgment Scoring. They earned a quality range from 60.5% to 70% RT as each did a more accurate job of reporting what they knew. In each case, eight more right answers were marked then wrong by both methods of scoring. This is one justification for combining the two methods of scoring into one score distribution. They are equally fair when the students make the choice of scoring method and the related levels of thinking.
One student did a perfect job of reporting (100% RT quality) to earn a test score of 57.3% using only seven questions. This student knew at the time of the test what he/she knew. Two others with the same score had to mark several more RT to offset their poor judgment marks just as is done in running a seven gated maze. The perfect student made no errors in passing the seven gates. The other two students needed to back track (PJ) 6 and 11 times. Students electing traditional scoring marked two to three times as many wrong marks as those electing KJS.
Many students fail to mark all questions because they run out of time trying to find a “best” answer when they have no idea of which one to mark. Others just give up. Either way, these are “don’t know” responses that signal to them that on the next test, “Omit rather than gamble.” Take responsibility. Mark an honest accurate answer sheet for the best score. Eight of the 24 students elected traditional scoring (individual boxes). One had the highest and one had the lowest score on this test. Three failed to mark all questions. These are scored as omits (good judgment to not mark a wrong answer) in the combined analysis. They would be scored as wrong if traditional, right-mark only, scoring was imposed.
The annual Math-Science Olympiad at NWMSU used Knowledge and Judgment Scoring to select scholarship winners from a test administered to over 300 students. Several students often earned the same test score. These ties were generally broken by selecting the student with the highest quality score (%RT). This is the student with the best judgment. The value of judgment was set to 0.50 as this was the first time most of these students ever had the option of Knowledge and Judgment Scoring. Over 90% of students, in class, voluntarily switched to Knowledge and Judgment Scoring after two tests.
/p>
The tally analysis is now completed on the Mark
Analysis and Question Difficulty, Sheet 5. It shows 22 Expected, 7
Discriminating, 13 Guessing, and 6 Misconception items. When most of the
class elects to answer a question and most are right it is
an Expected item (an expected performance by student and item). When few in the class elect to answer a question and most are
right it is Discriminating. When few in the class elect to answer a
question and most are wrong it is Guessing.
Misconceptions occur when most of the class elects to answer a question and most are wrong. This insight into student and item behavior
is not available in traditionally scored tests that only count right marks. The minimum honest value for judgment (not marking a wrong answer, omit) on this test is estimated to
be between 0.24 and 0.32 based on the ability of the test to fit student
preparation (between 24.0% and 32.2%). The estimate is based on six options, the five item options plus one
for omit. The rule of thumb value for the minimum honest value for judgment is 0.33 for classroom tests. How the class viewed each item is added to the
Student Counseling Mark Matrix, Sheet 3, with the addition of the Expected,
Discriminating, Guessing, and Misconception (EDGM) Analysis. Expected items
tended to be easy. Discriminating items hovered around the average difficulty.
Misconceptions tended to be the most difficult. This information tempers student
counseling. The two students, 12x and 13x, with the lowest score, 56.2%, chose different scoring
methods. In both cases they answered five more questions correctly than they
missed. The choice of scoring method did not provide an advantage. Formula scoring would subtract a
portion of wrong marks from right marks, 26 – (21/3) = 19 RT for a “true” score
of 39.6% for both. Formula scoring, deriving the “true" score, is a flawed attempt to clear away the damage done from forcing students to gamble to get a "complete set" of answers for psychometric purposes. It does not resolve whither
chance or the student’s knowledge directed the right mark. It does not permit an honest report of what is known
and not known based on individual student's self-judgment. Subtracting a portion of the wrong marks from a dishonest score
does not make it an honest score.
Meaningful, useful numbers should be harvested from a test that relate to something of importance. To first force students to share marking the test with chance and then remove what chance, on-average, has marked does not make the numbers more meaningful in the classroom. There is no way to know which right marks were made by the student or by chance before or after revising the score.
Each mark has meaning with Knowledge and Judgment Scoring. Knowledge and Judgment Scoring is not formula scoring. Knowledge and Judgment Scoring may someday replace right mark scoring in standardized testing to reduce the number of wrong marks and false right marks. (Wrong marks are a problem, in the classroom, as there is no way to know if a student really believed the option to be right, a true record of student performance, or the student selected the option by chance or in error.)
Wrong marks are a current problem on the
ACT test.
"About 83 percent of Chicago high school juniors surveyed believed that ACT scores are primarily determined by test-taking skills." The respected ACT invites this perception by scoring at the lowest level of thinking, and, by the average score, for 2007, being below 50% of a perfect score, where chance (rather than test-taking skills) determines the score as much as the students. Students are not taking the test to report what they know, but are playing an academic lottery that produces more wrong than right marks. Knowledge and Judgment Scoring can produce accurate honest scores at this level of achievement, as well as, easy to use student counseling matrixes.
Computerized Adaptive Testing (CAT), such as the National Council Licensure Examination (NCLEX) for nursing and administered by Pearson VUE, yields about half right and half wrong responses in the process of providing an efficient right-mark only test by adjusting the difficulty of items to each student’s performance. Individual student self-judgment does not have to be measured to rank examinees pass/fail. The student self-judgment component is replaced with the average performance of a control group.
The student is still a pawn in a testing game rather than an active participant reporting what is of value, trusted and understood.
The Test Performance Profile is the same as for a
traditionally
scored test. Nine items with a difficulty below 37.5% are labeled
“BAD?” They need to be examined. Adjustments can be made on Sheet 2 and the
test rescored before setting grades. This flexibility makes testing a powerful learning event when students are permitted to work on these items before setting grades. The 3a. Student Counseling Mark Matrix, (sheet 6) with
Mastery/Easy, Unfinished, and Discriminating (MUD) analysis is also the
same
except with the addition of the omit marks and the low number of wrong marks. On this test the students with the
lowest and highest scores got all the discriminating items right. Only two of
the MUD discriminating items were listed as discriminating by the students on
Sheet 3
(EDGM analysis). The two counseling printouts provide information that both students and teachers find meaningful. The self-assessment provides formative assessment for students directly and for teachers to use in class, time permitting. A new Copy Detector in PUP, version 4.01, replaces the former Cheat Checker based on selected critical pairs. The Copy Detector works with RMS and KJS. There is no need to do repeated trials in search of cheating on classroom tests. The Critical Pairing Index (Sheet 8) pairs each student with all other students. All 24 students in this example generated 276 pairings of which 116 had a Critical Pairing Index value greater than zero. Two charts, Pairing Index by Pairing Rank and Pairing Count by Pairing Index, provide a simple visual perspective. The top two ranking critical pairs need to be examined. The Individual Pairings (Sheet 9) present the facts in a manner that is meaningful and confidential to students, teachers, and administrators. The pair of students, 14 and 20, shows a set of six identical wrong marks (items 11 to 16) plus items 6 and 20. This is a borderline case just using probabilities. However, items 13, 16, and 20 are non-critical (yellow), they are shared with over five other students in the class. Items 6, 11, 12 and 15 are critical (orange). They are shared with five or fewer examinees. Only item 14 was unique (red) for this pairing. The three non-critical items ranked among the five most difficult items on the test with two of them (16 and 20) labeled misconceptions by students on the first Student Counseling Matrix (Sheet3). Three of the four critical items were labeled unfinished on the second Student Counseling Matrix (Sheet 6) by item analysis (difficulty and discrimination). These items ranged in difficulty from 40% to 44%.
Students 14 and 20 ranked in the bottom three of the class (55.2% and 56.3%). Low quality performance by students and questions can produce a large number of shared wrong marks, especially when RMS. In this example of KJS, most students in the class omitted (used good judgment to not make a wrong mark) instead of guessing. The lowest student scores included in the pairing analysis is by default set to zero for KJS. It can be set higher, to about 10 points below passing for large classes and RMS. (Passing is 60% in some countries and 50% in other countries.) Power Up Plus Excel Sheet Index
Sheet 1 1. ANSWER FILE DATA
Sheet 2 2. ENTER, EDIT AND SAVE DATA Sheet 3 3. STUDENT COUNSELING MARK MATRIX WITH EXPECTED,
DISCRIMINATING, GUESSING, AND MISCONCEPTION (EDGM) ANALYSIS
RMS
KJS Sheet 5 5. MARK ANALYSIS and QUESTION DIFFICULTY
RMS
Sheet 6 3a. STUDENT COUNSELING MARK MATRIX WITH
MASTERY/EASY, UNFINISHED AND DISCRIMINATING (MUD) ANALYSIS
RMS
KJS
Sheet 7 7. TEST PERFORMANCE PROFILE
RMS
Sheet 8 8. CRITICAL PAIRING INDEX (TOP TEN)
Sheet 9 9. INDIVIDUAL PAIRINGS (TOP FIVE)


Formula scoring has some use in psychometrics and standardized testing where only average group scores are of interest. Even here the test instructions generally confuse and cheat students.
This tangled line of reasoning is not
necessary when students are free to mark a right answer or to omit, and to be honestly
rewarded for taking the responsibility to do so. (Formula scoring overlooks that a complete set of answers included omit when multiple-choice was invented to assess animal behavior.)



8 October 2009