NY State Must Clear Up Mystery of Missing Test Items

Print More

J. Murphy

Psst! Four test questions are missing. They were on the New York State English Language Arts (ELA) exams given last April designed to measure students against the Common Core Standards. Evidently, the State Education Department pulled them after seeing the results.

The mystery evokes the 2012 Case of the Pineapple and the Hare when SED was publicly forced to scrap a reading passage and six baffling test items. Is history repeating itself, but in the dark this time, to ward off embarrassment and the protests that more bad testing would fuel?

Three items were yanked from the Grade 3 test results. Proof they existed lies in the 2014 Test Guides that SED put on its web site before the exams were given. A single item was struck from the Grade 7 test results and also banished from a sample of questions and answers the state released afterward to help parents and the public understand the Core exams. Taken together, their disappearance suggests problems with test publisher Pearson’s development and SED’s approval of poor test material.

Now, the fate of a few items may strike readers as a quibble. But given the political and—for students and teachers, personal—stakes riding on these exams, details matter a great deal. A single question can be the difference between a student being deemed proficient, or below and making progress, or not. And teacher competency is being graded on the basis of student performance.

While SED can decide after the fact not to count certain questions that are flawed or ambiguous, if those items are present on test day they serve to ramp up kids’ already high stress levels. What’s more, the state pays a lot of money—and offers up thousands of students as guinea pigs—so Pearson can field test items and devise demanding exams that purportedly tap children’s thinking ability.

Although the omissions I’ve detected affect only two grades, those grades encompass 138,500 kids in the city and 378,000 statewide, all with teachers, principals and families.

Grade 3’s test originally had 31 multiple-choice items worth one point apiece, eight short answer questions worth two points each and two four-point questions calling for extended responses. The test’s maximum raw score was 55 points.

The lost items can be identified from gaps in the Grade 3 Test Map that was posted after results for all questions became known to SED. The Map gives the number of each test item. The gap between item #28 and #31 pinpoints the deletion of questions #29 and #30, the two final multiple-choice items children faced on Test Day 1. Effectively, SED erased the answers of 190,000 8-year-olds. Imagine if a teacher was caught tampering in such a way with even a handful of answer sheets!

And #47, the final 4-point question on Test Day 3, dropped off the Map. It required an extended response intended to be cognitively demanding and gauge the future of third graders as SED saw them setting out on the aspirational path to college and careers. Could it be that SED decided to eliminate #47 because too many youngsters couldn’t reach it in 70 minutes? Or, after three days, were kids so muddled by the rigorous Core-related tests that they wrote incomprehensible answers?

The sudden omission of these three questions accounts for six points out of 55, 11 percent of the 2014 ELA taken statewide. The reduction substantially alters the composition and results of the test.

But in some ways, the tale of Grade 7’s item #8 is more telling. It was one of 35 multiple-choice items that had been field tested by the publisher and chosen for operational use on Test Day 1. That’s all we would have known about #8 except for two things.

Last August, SED disclosed half of the test questions for “review and use” and explained “why the correct answer is correct.” There had been mounting pressure from parents who wanted to know more about the Core tests which sent scores plummeting the year before.

The material included a fictional text about baseball called “The Girl Who Threw Butterflies.” Five items associated with the story were revealed. Each correct answer was justified. Reasons were offered to show the other answer choices, known as “distractors,” were wrong.

Then, someone showed me a copy of the test despite SED’s warning not to “reproduce or transmit by any means” any part of the test. Teachers had been ordered not to discuss details. On the actual test, however, there were six questions, not five. The item remaining invisible was #8.

I invite readers to use the link to Grade 7’s annotated questions (see page 16 of this pdf). Read the baseball story. Try to figure out the correct answer:

8. How is baseball Molly watches on television different from baseball Molly plays in real life?
A. The teams on television are quieter than Molly’s team.
B. The coaches on television are younger than Molly’s coach.
C. On television, baseball looks easy, but in real life, the game is challenging.
D. On television, baseball focuses on the star players, but in real life, the game is cooperative.

One wonders why SED might have killed the item. Might there be no answer? Could there be more than one correct answer? Perhaps, a higher percentage of students selected one or two confusing distractors than chose the answer SED deemed to be right? Maybe the item is biased against a certain group of students. Any of the above would give it a failing grade.

This need not be left to speculation. Shortly after the April exams were scored, SED and Pearson had item-by-item statistics in hand showing the percentage of children getting each question right and the percentage choosing the distractors. The data enabled them to study how every item functioned and determine which to keep, which to release and which pineapples to bury. All of that data should be released immediately.

But the state has not issued the testing program’s 2014 Technical Report, due from the publisher in December. It would present stats that allow independent researchers and testing specialists to examine all items and appraise the value of the exams. The 2013 Report was not posted until last July—four months after the 2014 tests had been given. Following this lagging pattern, expect the 2015 exams to be completed in April before too-little, too-late insights can be gained into whether the 2014 Core-aligned exams were on track. (As it is, the state already has cut back on item statistics it previously made available routinely. It no longer includes data on how many children selected the “wrong choices,” which can tip off substandard items.)

So, students, what do we draw from these revelations?

A) Clearly items that Pearson claimed were vetted by review panels and experts were unrefined and no better than field test items that somehow passed muster only to flop in prime time.

B) SED’s dirty secret is out of the bag: Its performance-defining cutoff scores are set after tests are given—in this case, after the raw score distribution had been studied and truncated.

C) SED plays fast and loose with data at its disposal, withholding information from the public that paid for it.

D) Efforts to classify students and evaluate teachers that rest on such shaky grounds are indefensible and unsustainable.

E) All of the above.

“E” certainly seems like the smart choice. But we can’t know for sure until an outside investigation is conducted into how SED and Pearson have run the testing program. Parents should hold their children out of all statewide tests until SED comes clean by providing complete and timely item analysis data and is able to demonstrate that the test results are relevant to the purposes they are being bent to serve—in other words, until there are meaningful alternative assessment programs in place.

Transparency in all matters concerning educational testing is a moral imperative. We must demand passage of revised Truth-in-Testing legislation, opening the testing process to sunshine and scrutiny, restoring its balance and something immeasurable—a level of trust in educational leadership that’s been missing too long.

* * * *
Dennis Tompkins, spokesman for the New York State Education Department, submitted this response:
“The processes Fred Smith criticizes are standard psychometric practice. I’d suggest Mr. Smith consult with the National Council on Measurement in Education and other professional assessment organizations. He really should try to understand an issue before he attempts to critique it. A little knowledge can be an embarrassing thing.”

* * * *

fred smith
Fred Smith, a testing specialist and consultant, was an administrative analyst for the New York City public schools. He’s a member of Change the Stakes, a parent advocacy group.

Follow City Limits’ other investigations. Get our free, weekly newsletter.

25 thoughts on “NY State Must Clear Up Mystery of Missing Test Items

  1. One would think that in his 31 years in Albany—mostly as a spokesperson for various agencies—Mr. Tompkins would have learned how to give a “nonresponse response” without being snotty. But then how does someone who apparently doesn’t know much about the issue respond to an expert? With an attack and the hope that no one notice how lame it is.

    • Agreed; Mr. Tompkins, please unbunch your panties and get over yourself. Your area of expertise is? Bureaucraticbullshit-ese? You had one job, Tompkins….one job…
      And while we’re co-opting Alexander Pope, here’s my version of ‘An Essay on Criticism’:

      A little learning is a dangerous thing
      Drink deep, or taste not the Pearson spring:
      Their shadowed questions only dull the brain,
      Embedded items waste kids’ time again.
      Mired at first sight with what the test imparts,
      Then teach to test, obscure science and arts
      While from the bounded level of our mind
      Short views we take to leave no child behind
      But more advanced behold with no surprise,
      The wool that’s being pulled over our eyes!

  2. So this is in addition to the 6 or 7 (depending on which grade) field test questions which also DO NOT COUNT! Putting all the children through this when the questions are not well developed, i.e., not good enough, is a horror. Refuse the tests and join 60,000 others in NY state who did so last year. This has gone on long enough. Go to nysape.org or changethestakes.org for sample letters and information, which has NOT been forthcoming by NYSED. Refuse to be left in the dark. Refuse to let your children be treated like laboratory animals.

    • Pardon me for noticing, but the cliche is that EVERYTHING is a horror to a “West Side Parent”.
      I’m sharing your post on Sanctimommy! 🙂

  3. Pingback: Lace to the Top Response to Daily News Op-Ed | lacetothetop

  4. There is a teacher from Long Island who has filed suit about her score. The State postponed stating it needed time to amass its data. Doesn’t this say it all or am I being naive ?

  5. Mr. Tompkins would rather us all believe that “standard psychometric practice” is infallible, that standardized tests are entirely objective and unbiased, that test data is never misapplied, and that manipulation or fudging play absolutely no role in the fields of test development and psychometrics. Mr. Tompkins acts as if the proper role of testing has been decided, that the stakes will remain high and that tests will drive the reforms…Mr. Tompkins is simply towing the line…a mouthpiece for flawed policies. We must continue scrutinizing.

  6. Pingback: NYS Corrupt Common Core Test Scores | lacetothetop

  7. As an educational psychologist, I would support Fred Smith’s analysis. In creating a test, items must be analyzed and field tested to have an instrument which is balanced as far as difficulty, coverage of the designated standards, and accessibility across cultural communities. Adjustments have to be made before the test is administered. I wonder why these items were discarded after the test was administered. We really need transparency and checks and balances. Stand alone field tests are not likely to be as reliable as embedding items in actual tests – using numerous versions of each test so that only a few items are in each version – i.e. giving each to a representative sample of students under actual testing conditions. If these items were not suitable, they should have been removed before they were used in a test. Pearson and SED must be open about their procedures – or must be monitored by an independent group of experts (like Mr. Smith) and parents/citizens.

    • I am completely agnostic on the NY tests, but really, you are talking out of your hat here. There is indeed a standard, accepted psychometric passage in which a small number of test items are pilot tested–tried out–among other items a scored test. The piloted items won’t be scored (if they yield favorable psychometric data, they’ll be used in subquently developed test forms). The reason for piloting items in this way is that students treat them as all the other items, because they don’t realize that they are being piloted. Another way of collecting psychometric data is to pilot an entire test. The drawback is that students, knowing it is a pilot test, rather than an “operational test,” may be less motivated to complete the items with care.

      • If I may say so, it is one thing to be agnostic about the existence of God. But for an “educator” to be agnostic about the test on which she or he is commenting is troubling. You are surely correct about “standard practice”—but then I’m equally sure that you can come up with all sorts of examples, both historical and contemporary, where what is “accepted” is wrong.

  8. Pingback: Testing Expert: Where Are NY’s Missing Test Questions? | Diane Ravitch's blog

  9. So lets face the facts here. No matter if they disclosed every single question and the correct answer and why it was correct, the only winners here are a) Pearson (A UK/Libyan company, so we’re not spending our American Tax Payer Dollars in America – AND, we didn’t have a choice how our dollars got spent); and b) the edubullies who insist on bogus tests tied to firing teachers and union busting. Shame on the USDOE. Shame on our President and his henchman, Duncan. This is all a ruse. Welcome to the New Order.

  10. Pingback: Resisting the Testing Juggernaut | GFBrandenburg's Blog

  11. Pingback: FairTest: Test Resistance Goes Viral | Diane Ravitch's blog

  12. Pingback: Ten Reasons Why NO Child Should Take the NYS Common Core Tests | Critical Classrooms, Critical Kids

  13. Sorry Fred, Dennis is right both in content, tone, and admonishment.
    ALL big testing companies include “sample” questions alongside the real questions, whose results are never going to be included in the students’ score. They are designed that way, before hand.
    *I* would say that
    Complaining without providing a solution is called “whining”. Propose a better way to experiment with new questions and new question formats.

  14. One of the advantages of common core is that it helps students to learn CONCEPTUALLY, rather than by rote. This is intended to reduce illogical statements like this one: “While SED can decide after the fact not to count certain questions that are flawed or ambiguous, if those items are present on test day they serve to ramp up kids’ already high stress levels. ”

    Help me to understand how you can be more stressed out on one day, based upon a decision made in secret in the future?
    Oh — I know — it’s because your parents have been riled up by trolling, fear-mongering, ignorant hack yellow journalism, like this.

  15. Also, I’m jealous of your education if you NEVER had a teacher make an error on a test and say, “I’m sorry, there was a typo on question #16, so it didn’t count against your score.” You have lived a privileged life! 🙂

    Imagine if a teacher was caught tampering in such a way with even a handful of answer sheets!

Leave a Reply

Your email address will not be published. Required fields are marked *